Did They Sense it Coming? A Pipelined Approach for Tsunami Prediction Based on Aquatic Behavior Using Ensemble Clustering and Fuzzy Rule-Based Classification

Tsunami is one of the real feelings of dread among humanity. Designing an early and effective Tsunami Warning System (TWS) is an immediate goal, for which the scientific community is working. Underwater seismic responses sensed by different numerical expository techniques have resulted in various cautionary frameworks proving successful in predicting tsunamis. However, multiple instances in the past where these warning systems have failed to generate alerts in time, has raised concerns to design even more efficient, diverse, and multidisciplinary warning methods or systems. However, there have been many instances in the past where these warning systems have failed to generate alerts in time, raising concerns about designing/implementing more efficient, diverse, and multidisciplinary warning methods or systems. Therefore, we propose a sequenced (ECGFC) approach for designing a TWS, based on Ensemble Clustering (ECG) and Classification for categorizing anomalous behavior in response to seismic perturbations, taking three aquatic animal behavioral datasets: Turtle, Earthworm, and Fish, as the input(s). ECG uses an existing state-of-the-art method bagged with Gaussian mixture model to label the dynamically changing behavioral data. The paper compares the results of the clustering ensemble used with baseline clustering methods on three behavior datasets as well as four benchmark datasets. The proposed sequenced (ECGFC) method is finally compared on three classification error metrics: MSE, MAE, and SMAPE on behavioral and existing ensemble frameworks in the state-of-the-art.


I. INTRODUCTION
The 2004 Indian Ocean tsunami, popularly known as ''Boxing Day Tsunami'' was marked as one of the most devastating events in the history of disaster science (because of high underwater seismic activity). West course of Sumatra or Indian Ocean being the epicenter of the event, Indian communities suffered tremendous loss of life and property [1]. Since repeated occurrences of tsunami have affected countries like The associate editor coordinating the review of this manuscript and approving it for publication was Seifedine Kadry .
India, Srilanka, Japan, Thailand, and Indonesia, scientists and practitioners are taking cues from pre and post analysis of various tsunami events. This analysis has been presented in the form of analytical studies, algorithms, methods, simulations, and models describing the occurrence, prediction, or impact of such events. Different Tsunami Warning Systems (TWS) have been proposed, developed in research, and then deployed to predict seismic perturbations. The stateof-the art has broadly classified these systems as Physiological (based on geophysics) [2]- [4], Societal (based on inter-human interaction), [5]- [7] and Nature-based (based on ecology) [8], [9]. While the former two, primarily rely on predicting seismic signals based on mathematical and computational analysis, the latter focuses on analyzing societal as well as nature's response towards seismic disturbances. Despite various TWS, there have been instances and observations prevailing in the literature depicting how this deadly event has incurred loss to humanity and nature [10]- [12]. These observations have underscored that TWS has struggled to produce timely warnings in the past. Accordingly, as indicated by the authors in [13], there is a need for advancement in tsunami science where approaches paving various disciplines, i.e. geophysics and ecology, can be more successful when appropriate.
Ecology, including plant and animal population, has responded in the form of unusual signals and responses to changing underwater conditions under seismic tremors. Animals can act as bio-sensors to predict natural disasters such as tsunami [14]. Reports [15]- [17], debated by researchers and scientists emphasize the presence of such signal or response production in aquatic animals towards seismic perturbations. Marine animals use their sensory receptors to navigate and breed. The varying underwater conditions such as oceanic flow, changing electromagnetic conditions across the flow can, therefore, affect or disrupt these sensory receptions [18]. Hence, the abnormal behavioral patterns in the form of unprocessed data can be tapped in real-time to help generate alarms after precise analysis.
The aforementioned literature citations clearly indicate that there are signs of anomaly in the behavior of underwater species whenever seismic perturbations reach the sea bed [19]- [21]. To analyze these signs, where the labels of the unprocessed data are unknown, computational intelligence analysis is needed. Hence, efficient machine learning paradigms [22], [23] can be used to classify such biological anomalous behavior datasets and help generate timed alerts. Here, the alert refers to a warning that can assist in the execution of safety measures and thereby avoid loss of life and property to humanity. Recently, a contribution [24] which reviews the impact of machine learning techniques to model complex behavioral data has attracted attention.
Machine learning techniques are categorized into two types: supervised and unsupervised. In unsupervised, clustering is one of the most critical unsupervised learning techniques that have found a wide application in data analysis [25]. The goal of clustering is to partition a dataset into several groups, such that data sampled in the same group are more similar than those in different groups [26]. Even after many improvements over existing base methods, various contributions in the state-of-the-art have proved that combining the strengths of various clustering termed as ensemble clustering (EC) can provide better insights towards data analysis [27]. EC is a process of combining a re-defined clustering method to obtain well defined and crisp partitions based on a weighted developed function [28].
The first aim of this work is to propose and evaluate an ensemble clustering algorithm (EC G ) based on a modified threshold K-means (IT K M) followed by a bagging based ensemble of a Gaussian Mixture Model (GMM). For bagging, neural training is used that enables a weighted cluster head selection of three dynamic multi-source datasets (MSD's) coming from different aquatic species viz. Sea Turtles(D1), Earthworms(D2) and Fish(D3). Multi-source data (MSD) is a form of data taken from multiple sources integrated for any further analysis and inferences. For input, the three MSD's in this work are prepared by tapping the indicative behavioral attributes in response to the geophysical data which are merged under common timestamp. The motive behind such aggregation of information is to underline unlabeled trends of parallel changes in the behavior of aquatic species and varying geophysical conditions. This unlabeled data can be labeled and analyzed using ensemble clustering where soft clustering methods like GMM coupled with weight adjusted neural training can identify unknown nonlinear dependencies and interactions, across multiple variables and cluster them into groups. Supervised GMM helps in overcoming the (which helps in learning the class labels which were otherwise unknown) slow converging nature of unsupervised GMM and yet use its ability to cluster probabilistic data. Hence, pipelining an optimized K-means with GMM and neural training feedback with FRBCS, this is a scalable approach, yet efficient for alert classification forms the main rationale behind this work.
Though, using GMM for capturing semantic relationships and annotate behavioral data has been proved resilient against missing data and inaccuracies in [29], [30], combining output from such semantic relationships with seismic alert possibilities is a new attempt.
The second aim of this work is to focus on finding different settings that can classify more data by learning these relationships from the clustered groups. Guided by unsupervised clustering, a further pipelined classification can provide better insights for unlabeled data [31]. In this paper, a Fuzzy-rule-based-classification system (FRBCS) is pipelined after the clusters are labeled and hence a generic approach (EC G F C ) is proposed. The aim behind using FRBCS is to understand the continuous probabilistic behavior of data within the clusters. The approach uses an optimized K-Means [79] method based on a sorted and dynamic centroid allocation technique Also, FRBCS provide a basis of converting human observed linguistic variables into a welldefined knowledge rule base. Such rule base can be used to design generic mathematical model that can flag alert situations based on any specific specie behavioral data. However, in this paper we evaluate the proposed generic approach for classification metrics (refer Section: Results).
The datasets analyzed here contains behavioral data values for days closer to tsunami days of the year 1997 for the Netherlands and 2004 for India. To understand intrinsic relationships between the existing data points, a class label is needed for mapping any consequent behavior as a response to seismic perturbations. Such response if observed well in time can easily help in generate timely alerts.
After the application/implementation of the proposed approach, the new clustered labels are categorized into two class labels i.e. one, which is the label for alert (having days closer to tsunami days) and zero for no-alert, which is a normal adaptive behavior. This paper provides an efficient pipelined (ensemble clustering & classification) approach to classify underwater specie behavior and use them as a precursor for future alerts. The proposed pipelined approach is initially evaluated on prepared datasets, and under 25 different settings. These settings are obtained by pipelining clustering and fuzzy rule based classification systems.
To ensure effectiveness, apart from evaluating the proposed algorithm (EC G ) on the prepared behavioral MSDs, the performance of the same has been tested on four benchmark datasets. The findings of this were outperformance of the initially proposed algorithm (EC G ) on benchmark datasets (refer Section: Results) in comparison to baseline clustering methods and the existing state-of-the-art method [33]. Here, the cluster statistical analysis based on Silhouette, Rand and Dunn Index [34] is used to provide empirical evidence to the results. The recent contributions [35]- [37] have used these benchmark datasets primarily for statistical clustering analysis and hence they form one of the bases of analysis in this article as well.
In addition, a comparison on six other benchmark datasets with existing ensemble clustering classification approaches is also drawn. The benchmark datasets selected here consist of three small scales and three large scale datasets, where the former comes from [32], identified as hard datasets. The large scale dataset comparison adds to the scalability property of the proposed approach.
The paper is an implementation of the scientific patent [38] published by the authors of this manuscript with the following novel contributions: • An ensemble clustering algorithm (EC G ) is proposed and implemented based on modified threshold K-means (IT K M) for labeling MSD's of three different aquatic species.
• The (EC G ) algorithm is a bagging based ensemble where a weighted cluster head selection is performed using a Gaussian Mixture Model and Neural Training.
• The existing baseline FRBCS have been sequenced to the above obtained grouped data for seismic alert classification.
• Real-time prediction using unlabeled data acquired. Till date, only unlabeled datasets are available -a transformation from unsupervised to supervised data The rest of the paper is organized as follows: Section II describes the related work. Section III initially describes the dataset collection and pre-processing, followed by the base state-of-the-art methods for behavioral data clustering, and finally, the proposed methodology. The results from applying the base, as well as proposed methods on behavioral datasets, are shown in Section IV. The article concludes with Section V, which summarizes the main conclusions, identifies areas for future work

II. RELATED WORKS
Various classifications, as well as clustering algorithms, have been used for categorizing animal behavioral data. Understanding animal behavior (specifically for aquatic species) for any ambiguity can be challenging. One of the reasons that account for such a challenge is that the data collection for identifying anomalous behavior trends needs continuous data recordings [39] from multiple sources. The lack of any such continuous records and multi-source data fusion, therefore, leads to an incomplete and missing value data set where a robust yet efficient learning algorithm is needed to classify or categorize behavior as per the constraints. The subsequent subsections explore the existing models in the respective domains extensively. Table 7, which summarizes the essential aspects, is also further discussed in conclusion, shows that the Ensemble of Clustering pipelined with a classification method for behavioral analysis goes beyond what was done in prior studies.

A. ANIMALS AS TSUNAMI PRECURSORS
Animals, both terrestrial and aquatic, have shown ambiguous behavior before and after various natural disasters. Various studies have highlighted the post-tsunami impact on the animal population [40]. However, due to lack of deployed sensors and hence unavailability of data, pre-tsunami impact analysis on animals remains an open area. Various studies have discussed that certain animals like fish [41], toads [42], elephants [43], and whales [44] have shown unusual behavior as they could sense the pre-tsunami signals which humans and machines could not. Therefore, a global warning system on animal behavior data analysis that doesn't exist in the currentstate-of-the-art to generate tsunami alerts is a dire need for progress in tsunami science.

B. MACHINE LEARNING FOR ANIMAL BEHAVIOR CLASSIFICATION
Various machine learning algorithms have found applications in animal behavioral classification. Some on terrestrial animals such as cattle [68], sheep [69], while some for aquatic species [70], [71]. Followed by basic machine learning algorithms, some ensembles have also been applied for behavioral classification [72]. These applications have mainly been related to in identifying grazing or migration patterns. Hence analyzing change in aquatic animal activity due to seismic perturbations can be open area this article aims to explore.

C. ENSEMBLE MULTI-SOURCE CLUSTERING: INSIGHTS
In remote sensing, data is tapped from multiple sources. Need for soft clustering methods to label such multi-view data is inevitable. The success of ensembles application in supervised classification tasks has motivated researchers to use the same in unsupervised tasks [45]. Lack of guidelines that define the selection of any individual clustering algorithms still exists. Jointly mining clusters from multiple data sources has been emerging as a novel direction in the domain of clustering analysis. Fern and Lin [46] designed three ensemble selection methods based on quality and diversity. Hong et al. [47] introduced a novel selective clustering ensemble method through resampling. Azimi and Fern [48] proposed an adaptive cluster ensembles method. However, designing a well-weighted consensus function is essential to clustering ensembles. Apart from voting [83], [98], bagging ensembles have also given promising results. Tsymbal et al. [50] presented an iterative clustering through a weighted scheme that outperformed many other selective voting or bagging methods. Another method, such as spectral clustering [51], which labels communities based on graphical linkages, is evaluated to be computationally expensive and hence needed a bagging method for quality improvement in label formation. The used ensemble here is different from Multi-view clustering (MVC) [52] as the former aims to find the cluster structure shared by multiple views of a particular dataset. From various improved clustering methods [53]- [56] proposed and discussed one of the methods for unsupervised labeling with a low computational cost is the Gaussian Mixture Model (GMM). Recently, GMM has found applications in various areas [57], [58] and hence is forms a basis in the proposed EC G method of this article as well.

D. FRBCS FOR ANIMAL BEHAVIORAL CLASSIFICATION
Fuzzy models are used when a system cannot be defined in precise mathematical terms. The non-fuzzy or traditional representations require a well-structured model and welldefined model parameters. However, in practice, there may be uncertainties, unpredicted dynamics, and other unknown phenomena that cannot be mathematically modeled. The main contribution of the fuzzy modeling theory is its ability to handle many practical problems that cannot be adequately represented by conventional methods [97]- [102].
In this work, the input to the proposed method is a multiple animal behavioral dataset (D1, D2, and D3) exhibiting complex nature that can be modeled using fuzzy relations and rules [59]. Various methods have been used in the past, forming the optimized structure basis of the fuzzy model thus developed [60]- [62]. Since they are limited to specific objective functions, specific types of inference, and specific types of membership functions, this paper deals with some standard structured methods based on partitioning, and genetic algorithms have been used for FRBCS modeling [65]. A combination of clustering with FRBCS has found applications in various fields [66], [67].

E. PIPELINING CLUSTERING & CLASSIFICATION
Under various existing algorithms, SVM and Random forests have been extensively used for animal behavioral classification. [75], [76]. The ensemble nature of Random forests, which use bagging and bootstrapping, has been cited as one of the reasons for its better performance. As stated by authors in [24], not every learning algorithm can perform for all behavioral categories. Therefore, multiple machine learning algorithms and their ensembles are needed to study a specific behavioral category, for which existing methods may have undesired performance. Various fields, such as text classification [73], credit scoring [74], image classification [77], etc., have also used a combination of pipelining of clustering and classification.

III. METHODOLOGY
As described in previous sections, pipelining clustering and fuzzy classification have found broad applications in the field of data analytics and is thus the approach followed in this work for seismic prediction analysis. The GMMbased clustering ensemble (EC G ) used here allows to classify different relationships among the features taken and assigns alerts and no alert labels as 1 and 0, respectively The labeled clusters hence obtained, is the input to standard FRBCS for further classification hence a sequenced method is presented: EC G F C . The rule base hence received can be used as a knowledge base any further similar classification.
The first subsection describes one of the state-of-the-art clustering method already implemented in [78] from [79] and [33] on Sea Turtle behavioral data. The following subsection presents an intermediate method. The mentioned two clustering methods have been initially compared based on cluster statistical analysis (ref Section: Results). The better performing clustering method assessed using cluster indices is extended further and the EC G method is proposed, presented in the third subsection along with the algorithm implementation. Finally, the last subsection explains the sequenced approach EC G F C .

A. IMPROVED CENTROID K-MEANS: IC K M
The method IC K M improves the baseline K-means method to circumvent the latter's spherical nature. Figure 1 gives the complete workflow of the method. The adequacy of the method for labeling behavioral data lies in the fact that animals dynamically change their direction or location, along with their underwater count amid seismic perturbations. As data is continuous and complex, there is still a need for an effective algorithm to learn about cluster labels from the given unlabeled dataset. An improved clustering method is thus discussed [33] in the following subsection. This threshold-based clustering method is also another improved method that can be used to label behavioral data. The algorithm IT K M as shown below, takes marine behavioral dataset (mabD) as one of the inputs, having dimensions: i × j in which i = number of entries and j is the feature set. Another input to this method is k which is the number of clusters. This method uses two functions Weighted_Score and CalThreshold.
C , which is the output of the later, finally gives clustered data points based on an improved threshold of K-Means. The workflow for the same is shown in Figure 2.

Algorithm 1 IT K M
/ * This function returns ws i −the weighted score for each data point in the input dataset * / / * Input:mabD is the input marine behavioral dataset * / / * Input: N is the number of data points in the dataset * / / * Output: New set of refined cluster centers as C * / 1: function Weighted_Score (mabD, N) 2: array dp = [dp 1 ,dp 2 ,dp 3 . . . .dp m ] in mabD 3 array N = [n1,n2,n3 . . . .nj] in mabD 4: for i in 1 to n 5: : end for 7: for i in 1 to n 8: w i * dp i 9: end for 10: return ws i 11: end function / * This function allocates each data point according to the newly identified centers * / 12: function CalThreshold (ws i , N, k) 13: for i in 1 to k //here k defined the number of clusters 14: for j in 1 to N //here N is data points 15: if F min > ws i 28: set T min = ws i 29: end if 30: end for 31: T HV (threshold value) = (F max -F min )/2 32: C 1 = (F max -T hv )/2 33: C 2 = (T hv -F min )/2 34: for i in 1 to K //where k is the number of clusters 35: for j in 1 to K i the number of objects of the cluster i 36.
return G {C 1 , C 2, C 3 . . . . . . C k } = K i F ij−Oi //F ij is the j-th object of the i-th cluster, and O i is the centroid of the i-th cluster, which is defined. 37: end for 38: end for 39: end function

C. PROPOSED ENSEMBLE CLUSTERING METHOD: EC G
This ensemble clustering technique uses the GMM, followed by the Fuzzy C means clustering. To illustrate, here, datasets D1 (Sea Turtle), D2 (Earthworms), and D3 (Fish) have animal behavioral feature values attributed in response to geophysical changes. An analysis of data can help to obtain the period, which was either alert or no-alert prone based on several parameters. The method at first identifies the data points pertaining to a particular period represented using weighted scores, evaluating whether the data point resides above the threshold level of risk or is below the threshold level. This computation follows from the previous algorithm IT K M. The workflow is for the proposed method is shown in Figure 3. As, in this paper we deal with a fuzzy probabilistic data having data points of probability belonging to multiple clusters at a time, hence a soft clustering method is used. In terms of time taken the method GMM suffers from slow convergence and takes more time to cluster data as compared to K-Means even on small datasets. To speed up the process, a supervised GMM method can be used. By having a prior knowledge about cluster/class labels, GMM can cluster data in a time comparable to a baseline K-means method.
The algorithm EC G as shown below later refines the centroids of the classes and the belongingness of the data points with cluster centers and threshold levels that can help in the proper classification. The output of the clustering algorithm IT K M at, first step and the combined probability of belongingness of the data point to a cluster is the input to the GMM. GMM refines the cluster centers by using the threshold, the probability of belongingness of a data point, and the objective function. In the following step, the GMM model's output is allowed to pass through the fuzzy C means, which takes into consideration the probabilities as well as the fuzzy rules formed. In the last step, the cluster output is used to train the neural network helping improve the clustered data.
The ensemble here iteratively clusters the data point that is nearer to the centroid and has a high value of probability for belonging to that cluster whose centroid it is closer to and low likelihood for fitting to the latter cluster. The individual labels can be analyzed later on to check which parameter of the data point played a dominant role in its move towards a particular class label. For simulation, the R studio platform is used to implement the proposed EC G method. R environment is an open-source platform that allows re-implementing existing packages [80] to devise new ensembles as per the data analysis needed.
D. PROPOSED SEQUENCED METHOD: EC G F C Fuzzy rule-based classification systems have become a powerful tool in mining inferences from complex real-world problems using fuzzy concepts.
As animal behavioral data depicts non-linearity, FRBCS has been used to deal with such behavioral data values [81].
However, this approach may not be feasible when facing complex tasks or when human experts are not available. An effective alternative is to generate the FRBCS model automatically from data by using learning methods. Many methods have been proposed for this learning task, having clustering methods [82] as one of them. Hence, in this article, the approach EC G F C uses the proposed method EC G (refer subsection: C) and FRBC for classification tasks. The FRBCS based on the default parameters are used for further classification, as shown in Table 1.
For the simulation requisites, the proposed pipelined approach workflow, as shown in Figure 4, is implemented using the R studio platform. For cluster statistics and baseline methods, R platform packages have been used.
Step 1: Preprocessing The raw data for the three species mentioned is preprocessed to convert it into an attributed dataset (D1, D2, and D3) Step

2: IT K M & EC G application on Featured dataset
The two methods IC K M (refer Subsection: A) and IT K M (refer Subsection: B) are initially applied to D1, D2, and D3.
As the latter produces better cluster quality (refer Section: IV) identified by certain cluster performance indices. IT K M is further improved to give EC G (refer Subsection: C).Both IT K M & EC G are applied to the featured dataset.
Step 3: Compare and Analyze Evaluate both IT K M and EC G based on cluster statistics using the following three equations: (1,2and 3) [34], [78] S [i] = smad dp i − ad(dp i ) max{smad dp i , b(dp i }) Here S[i] is the Silhouette Coefficient for the i th data point where for a given data point dpi, the smallest average distance of i to all points is given by smad (dp i ), where dp i does not for j in 1 to C' 17: belong to that taken cluster, and the average distance between dpi and all other data within the same cluster is given by ad(dp i ). S[i] summed over-all points is termed here as SC (Silhouette Coefficient).
where min.separation is separation within-cluster, and max.diameter is compactness within the cluster. Here the cluster separation is between two clusters labeled as 0 for noalert and alert as per animal behavior in response to changing physical conditions. A rand index to compute the ratio between agreements and no-agreements of the two classes here alert and no-alert.
Step 4: EC G : improved clusters As per the results shown in Table 3, the proposed EC G performs better in terms of cluster statistics, as compared to IT K M.
Step 6: EC G F C : The proposed sequenced method Finally, the above-evaluated ensemble clustering method is further given to FRBCS which gives the pipelined approach: EC G F C . a generic framework that can classify alert signals from a dynamically changing probabilistic unlabeled data capturing aquatic behavior (which needs soft clustering methods like GMM) and further capture a reduced knowledge fuzzy rule base by adjusted neural training. Studying rule base reduction, time reduction because of neural training introduced by employing proposed approach falls to be another future idea we aim to work for. Currently, the pipelined approach which is obtained in form of 25 different arrangements by permuting six baseline FRBCS with various baseline, existing and proposed clustering methods (refer Table 5) is evaluated for classification metrics.

IV. RESULTS AND DISCUSSION
This section describes all aspects of data collection and preprocessing followed by the analysis and discussion.

A. DATASET DESCRIPTION AND PRE-PROCESSING
The dataset(s) used in this work captures the behavioral activity of three underwater aquatic species. The three species are viz: sea turtles (Scientific name: Caretta caretta), earthworms (found on the underwater seabed), and fish. It is observed that animals or micro-organisms can behave as biological sensors to predict seismic disturbance on land and in water, based on information they receive from underground geophysical sensors, before many observable days extending up to 24 hours. [84], [19].The effect of a seismic-driven geophysical change on the three mentioned species is described in [85]. The dataset for varying geophysical values is modeled from [86], where it shows that a secondary induced magnetic field induces an electric current by Tsunami Flow.
An intersecting time stamp fusion conducted for all three species offers the respective associated behavioral data values used in [86] for Sea Turtles and [87] for Earthworms and Fish. The aforementioned data source provides the raw latitude and longitude values in the form and the count underwater. The data is pre-processed by calculating two features Angle of Deflection (Day) and Angle of Deflection (Monthly) by using Haversine equations already used by authors in [88].
A group by clause over common species gives the underwater Count of the respective issue grouped under recurring specie id. Finally, the pre-processed dataset is mapped to varying electromagnetic values obtained under the time stamp of intersection. Table 2 shows the size of the collected datasets, while Table 3 displays the meta-data with the range of values of the data prepared for all organisms.

B. EC G ON BEHAVIORAL DATASETS: EVALUATION AND COMPARISON
We have evaluated the cluster statistical indices as given by equation 1, 2 and 3 for the initial method IC K M, IT K M as well as the proposed method (EC G ) on the three aquatic animal behavioral datasets Sea Turtle (D1), Earthworms (D2) and Fish (D3). IT K M as explained is an intermediate method developed for further refinement achieved in the finally proposed method EC G . The results show a comparison between these three clustering methods and the three baselines partitioning clustering methods viz. Hierarchical K-means, Fuzzy C-Means, and K-means [89]. Here, the proposed EC G  outperforms all the baseline methods taken. The outperformance is empirically supported by the three cluster indices calculated: Silhouette Coefficient, Dunn and Rand Index. These indices are used to compare the performance of the clustering algorithms in terms of cluster quality, ability to accurately find the intrinsic groupings and agreement within the clusters [90].
In Table 4, the proposed method gives a high Silhouette Coefficient, Dunn and Rand Index value on all the three datasets as compared to other methods. A high score in silhouette signifies the ability of the method to effectively cluster the intrinsic relationships into crisp groups. Dunn index for EC G evaluates to be 0.55 (D1), 0.53 (D2) and 0.56 (D3) which is marginally higher as compared to baseline K-Means or Fuzzy C Means. A higher value of rand index which is an extrinsic validation index also provides evidence to the outperformance of the proposed method. There is also a variation across all three behavioral datasets, specifically for silhouette value, for all other methods there is a small difference between two other indices.
Hence, the unlabeled data values from all the three datasets can now be labeled to form the common clusters framed for each data point in sea turtle, earthworms, and fish behavioral dataset separately. The improvement in cluster identification of all three specie behavior, which is plotted between Count parameter of the particular specie and the Angle of Deviation in navigation shown monthly, is depicted in Figure 5, 6, and 7. The reason to choose these two parameters in comparison lies in the hypothesis discussed in [33] where a change in navigation in response to abnormal seismic activity brings in a change in specie population count abnormally.

C. EC G ON BENCHMARK DATASETS: EVALUATION AND COMPARISON
This section describes the statistical cluster analysis performed for two baselines existing (K-Means and Hierarchical clustering), one intermediate (IT K M) and the pro-VOLUME 8, 2020 TABLE 4. Accuracy of the competing methods in terms of (a) Silhouette Coefficient, (b) Dunn-Index and (c) Rand Index There is no particular baseline which is the best across all datasets.  posed method (EC G ) for UCI benchmark datasets [91]. Table 5 presents the results of the three cluster indices over four benchmark datasets for the mentioned methods. As shown in Table 4, intermediate method IT K M has outperformed IC K M in all clustering comparison metrics for all behavioral data sets considered for the given problem; hence the benchmark data sets are only evaluated on IT K M and EC G where the latter shows successful values for all five datasets. Say EC G gives Silhouette Coefficient of 0.98 for iris, 0.88 for the breast_cancer dataset while IT K M gives 0.68 and 0.56 for the same respectively with K-Means as 0.49 and 0.45.

D. PERFORMANCE EVALUATION TO SELECT BEST-SEQUENCED METHOD
For sequenced approach, we have permuted all feasible sequences of baseline standards 5 (five) FRBCS methods with 2 (two) standard clustering methods, two used and one proposed method (IC K M, IT K M, and EC G ) and select the best ensemble as EC G F C . While IT K M 's efficiency has outperformed IC K M in terms of cluster statistics (Table 4), it is still permissible to classify any improvements in the same after a fuzzy classification of the ensemble. Table 6 presents the analysis to select the best sequence approach of clustering and classification (EC G F C ). Of the

SMAPE (Symmetric Mean Absolute Percentage Error
For all three datasets which give behavior of sea turtle, earthworm, and fish, the devised EC G F C having ITKM and GFS.GCCL have following values for RMSE,MAE and SMAPE.14756,.54345 and 13.67 for turtle behavior dataset (D1) and.251248,,.52213 and 12.78 for Earthworm behavioral dataset(D2). Similar lower error values are observed for the fish behavioral dataset (D3). The defined methods or the baseline methods such as K-Means, Hierarchal K-Means have also been sequenced with all standard FRBCS, as presented in Table 5 Evidently, FH.GBML and GFS.GCCL have low error values when bagged with baseline clustering methods. RMSE provides Out of all 25 different settings, two have performed best having the proposed ensemble clustering methods (IC K M, IT K M, and EC G ). Figure 8 and Figure 9 show RMSE and SMAPE plots for all three behavioral datasets: Sea Turtle, Earthworms, and Fish for all 25 ensembles numbered here from 1 to 25.Sea turtle displays, as shown, the least RMSE values for all permuted ensembles while fish shows the most. The mentioned metrics: RMSE & MAE provide generalized probable accuracy criteria for unseen and unlabeled data [49]. Sea Turtles may indeed serve mostly as fairly powerful predictor in detecting any seismic activity underwater, as opposed to fish or earthworms. Nonetheless, it can be inferred from the study of high SMAPE values that more efficient methods can be planned and implemented for higher precision performance.

E. SEQUENCED METHODS WITH EXISTING ENSEMBLES
From Table 6, the best-performing combination of clustering and classification (EC G F C ) is EC G sequenced with GFS.GCCL. For all three behavioral datasets, high reliable results make the above-mentioned combination one of the most powerful ensembles to predict anomaly patterns in aquatic animal activity prior to seismic disturbance. The FRBCS used here creates a knowledge base of specific fuzzy rules based on labeled data to identify any similar abnormality in the future. Four benchmark datasets are analyzed to further validate the selected ensemble performance in order to compare obtained sequenced EC G F C approach with current state-of-the-art ensembles. The results are shown in Table 7. Because the benchmark datasets are already  clustered, the existing labels are removed here and the results will be evaluated after EC G F C application. Different sets and combined frameworks have been proposed in state-of-the-art terms; however, only clustering classification approaches are selected for the comparison presented. The existing frameworks [93], [94] have already been used for breast cancer profile identification and some other benchmark datasets respectively. The comparison here is identified in terms of the percentage of data instances correctly classified after applying EC G F C . The sequenced method EC G F C give higher accuracy results for the considered benchmark datasets: Iris, Wisconsin Breast cancer, Heart Disease and Car Evaluation [91].
In order to understand the performance statistics in reference to the scalability, the benchmark datasets were selected having different scales. Here both, a small iris dataset (150 data points) were considered for performance comparison and evaluation along with one mid-scale dataset (instances < 2000) and two large scale datasets (instances > 2000) taken from UCI repository [91]. The explanation for evaluating small-scale data sets was based on the fact that the behavioral datasets collected and then prepared were low-scale, hence the empirical evaluation and validation of the proposed method was carried out initially on smallscale benchmark datasets. However to comment about the scalability yet efficiency and analyze the trade-off between the two, one of the recent big fuzzy data algorithm (Chi-BD [104]) has been used for the evaluation from [37].
The comparison subsequently reaffirms the need to pipeline an EC G pre-processing clustering method prior to the FRBCS (Fuzzy Rule Based Classification Systems) baseline where EC G provided optimized quick labeling and further assisted in improving classification. Even on large dataset, the proposed sequenced approach outperformed the current ensembles. Compared to 77.08 reported by authors in [37] using one of the big data fuzzy classification algorithm, the percentage of data instances correctly classified after application of EC G F C accounts to be 89.46. Table 8 summarizes the most relevant related articles that have employed Ensemble clustering or classification on animal behavioral datasets. A fair comparison is not possible, as this work is a new attempt to integrate behavioral response analytics on a prepared MSD for seismic alert classification. However, even though the behavioral study of both marine and terrestrial animals has been in literature for many other applications. In the presented comparison, ensemble classification has provided insights about cattle monitoring [72] and conservation area identification in dolphins [97] cattle.
The state-of-the-art, as shown, differs from the current analysis in terms of applications, including the animal type case taken. It can evidently be summarized that an integrated scalable approach focusing on finding marine behavioral patterns which can assist seismic alerts in the form of a global system is an unexplored application area from the exhaustive analysis presented at the moment. VOLUME 8, 2020

V. CONCLUSION
In this paper, three behavioral datasets have been used to identify any pattern that can help in retrieving real-time alerts on seismic disturbances. The identified species for the same are Sea turtles, Earthworms, and Fish. The data here is prepared by combining both biological and geophysical sources. From the prepared dataset, a classification identifying alert or no-alert situations can be performed based on ambiguous behavioral signals exhibited by marine species in response to seismic perturbations. This article focuses on the labeling of prepared data by using a proposed ensemble clustering method EC G which is evaluated on four benchmark datasets and three behavioral datasets in terms of clusters statistics viz. Silhouette Coefficient, Dunn, and Rand Index. For classifying alert and no-alert data points, the sequencing of the proposed EC G method is done with state-of-the-art fuzzy rule-based classification methods. A total of 25 sequenced combinations are evaluated on all three behavioral datasets for MSE, MAE, and SMAPE. One of the advantages of the proposed sequenced approach its generic adaptability towards behavioral datasets. The results coming from cluster indices empirically support the ability of ensemble approaches for similar application areas.
Sea turtle dataset hence prepared, showed the best MSE, MAE, and SMAPE values. The sequence method having proposed EC G and GFS.GCCL showed the least error statistics across all three datasets. Therefore, using such optimized sequenced methods, behavioral datasets can be easily classified. Such improved classification can form a basis to program various global epidemic systems to raise alerts from animal behavior based on mined patterns learned.
The goal of this research is to progressively move towards a global TWS using marine behavioral statistical analysis. Finding more categories of species (both terrestrial and aquatic) that can act as biosensors for seismic alert classification is one of the areas this research opens to. More data sources need to be evolved that can help in devising more insights using the machine and deep learning methods. In the future, we plan to use other fuzzy classification methods recently proposed for both big and small-scale data, where further validation will be sought. Furthermore, how the proposed solution allows for a reduced base of rules, and how time reduction shapes another field of this research. The Network with HQ in Seattle, WA, USA, has currently more than 1 000 scientific members from more than 100 countries. As an Investigator/Co-Investigator, he has won research grants worth more than 100 Million U.S.$ from Australia, USA, EU, Italy, Czech Republic, France, Malaysia, and China. He is also works in a multi-disciplinary environment involving machine intelligence, cyber-physical systems, the Internet of Things, network security, sensor networks, Web intelligence, Web services, data mining, and applied to various real-world problems. In these areas, he has authored or coauthored more than 1 400 research publications out of which there are more than 100 books covering various aspects of computer science. One of his books was translated to Japanese and few other articles were translated to Russian and Chinese. About more than 1 200 publications are indexed by Scopus and more than 900 are indexed by Thomson ISI Web of Science. Some of the articles are available in the ScienceDirect Top 25 hottest articles. He has more than 1 200 coauthors originating from more than 40 countries. He has more than 38 000 academic citations (H-index of 93 as per Google scholar). He has given more than 100 plenary lectures and conference tutorials (in more than 20 countries). For his research, he has won Seven Best Paper awards at prestigious International conferences held in Belgium, Canada Bahrain, Czech Republic, China, and India. Since 2008, he has been the Chair of IEEE Systems, Man, and Cybernetics Society Technical Committee on Soft Computing (which has more than 200 members) and has served as a Distinguished Lecturer of the IEEE Computer Society representing Europe, from 2011 to 2013. He is the Editorin-Chief of Engineering Applications of Artificial Intelligence (EAAI) and serves/served the Editorial Board of more than 15 International Journals indexed by Thomson ISI. He is actively involved in the organization of several academic conferences, and some of them are now annual events.
LORENZO SALAS-MORERA (Member, IEEE) received the M.Sc. degree in agricultural engineering and the Ph.D. degree in agricultural engineering from the University of Córdoba, Spain, in 1989 and 1993, respectively. He is currently a Professor with the University of Córdoba in the area of projects engineering. He is the Vice-Chancellor of Digital University and Strategic Planning of Córdoba University. His primary areas of research are engineering design optimization, intelligent systems, machine learning, user adaptive systems, interactive evolutionary computation, project management, risk prevention in automatic systems, and educational technology.
LAURA GARCIA-HERNANDEZ (Member, IEEE) received the M.Sc. degree in computer science from the Universitat Oberta de Catalunya, Spain, in 2007, and the European Ph.D. degree in engineering from the University of Córdoba, Spain, and also from the Institut Français de Mécanique Avancée, Clermont-Ferrand, France, in 2011. She has been an Invited Professor during a semester with the Institut Français de Mécanique Avancée, Clermont-Ferrand. She is currently an Associate Professor in project management with the University of Córdoba. Her primary areas of research are engineering design optimization, intelligent systems, machine learning, user adaptive systems, interactive evolutionary computation, project management, risk prevention in automatic systems, and educational technology. In these fields, she has authored or coauthored more than 70 international research publications. She has given several invited talks in different countries. She has realized several postdoctoral internships in different countries with a total duration of more than two years. She received the prestigious National Government Research Grant José Castillejo for supporting their Postdoctoral Research during six months with the University of Algarve, Portugal. She has been an Investigator Principal in two Spanish research projects and has also been an Investigator Collaborator in some research contracts and projects. She is an Expert Member of ISO/TC 184/SC working team and the National Standards Institute of Spain (UNE). Moreover, she is a member of the Spanish Association of Engineering Projects (IPMA Spain). Considering her research, she received the Young Researcher Award granted by the Spanish Association of Engineering Projects (IPMA), Spain, in 2015. Additionally, she received two times the General Council of Official Colleges Award at prestigious International Conference on Project Management and Engineering both 2017 and 2018 editions. She is the Co-Editor-in-Chief of the Journal of Information Assurance and Security. She is an Associate Editor of the following ISI journals: Applied Soft Computing and the Journal of Intelligent Manufacturing. VOLUME 8, 2020