Hybrid Data Regression Model Based on the Generalized Adaptive Resonance Theory Neural Network

,

Perceptron (MLP) [4], [5] and Radial Basis Function (RBF) [4], [6] models. The MLP network handles regression problems by exploiting its hidden neurons to build a nonlinear transformation of combined sigmoid functions based on the hidden neurons. On the other hand, the RBF network handles regression problems using the method of combining nonlinear semi-parametric functions. An example of such functions is the Gaussian kernel function.
However, in a standard MLP or RBF network, learning is based on a method that is known as the offline or batch learning mode [7]. During the process of batch learning, after receiving every single training sample, the network runs an iteration and only after that the network weights are updated [8]. Because batch learning typically involves multiple iterations through a number of training samples, it is usually a time consuming process of learning. A re-training procedure is also necessary when a new sample is presented to the network. During this re-training process, the new sample and all past samples are required for learning. [7]. In addition, choosing the proper learning and network parameters such as the number of nodes, number of learning epochs, learning rate, and stopping criteria require a lot of fine-tuning attempts [7]. Choosing of the network parameters is usually conducted after a series of trial-and-error experiments, which are time-consuming.
The batch learning approach is useful only when the data environment is stationary, and provided that the training samples are sufficiently representative. This is because during learning, information provided by the training samples collected from the environment is encoded by the adjustment (learning) of the network weights. After validating the network performance, the network is put into operation, and no further weight adaptation (or learning) takes place. When the network is presented with an unseen new sample, a built-in mechanism for the network to recognize the novelty is not available. In order to learn new information, the network needs to be re-trained using the new sample, together with all previous samples. This is a major drawback in most neural network models, and it arises from the socalled stability-plasticity dilemma [8]. The dilemma underlies a series of questions, i.e., how a learning system is able to remain plastic or adaptive in response to significant events, and yet remain stable in response to irrelevant ones; how a learning system is able to adapt to new information without corrupting or forgetting previously learned information [8].
For solving stability-plastic dilemma, an Adaptive Resonance Theory (ART)-based neural network namely Fuzzy ARTMAP (FAM) [9] has been proposed. The learning strategy involve minimum operator of Fuzzy and weight of a neuron (or commonly named as ''category'' in FAM) is a hyper-rectangular. It is capable of online learning of new data samples without disturbing the knowledge learned from previous data samples. When a set of data samples are available before online learning, an Ordering Algorithm [10] is proposed to arrange the presentation of data samples. This ordering algorithm required a parameter, i.e., number cluster centers that is number of class of the application plus one. The use of Ordering Algorithm improves the performance (better accuracy rate and smaller network size), however it is limited to pattern classification problem and it is considered as batch mode learning as all data samples must be available before a fix order of presentation can be decided. From the learning strategy of FAM or FAM with Ordering Algorithm, we can conclude that those approaches are only applicable for pattern classification problem. It is not feasible for data regression task.
Hence, it serves as our motivation to propose an extended ART-based model for online learning and capable of solving data regression task, namely Enhanced Generalized Adaptive Resonance Theory (i.e., EGART model). Unlike the standard ART-based neural networks, EGART model did not use Fuzzy minimum operator (hence, weights of a neuron will not be of a hyper rectangular shape), instead it uses Gaussian distribution for learning and representation of its weights. In addition, this paper presents three different operating strategies of the proposed EGART model, i.e., (i) a fully online learning EGART; (ii) When there is a set of data samples available, the proposed EGART model can be combined with Ordering Algorithm (namely Ordered-EGART model) for offline learning with better performance (smaller error for regression, smaller network size which means lesser number of category created) than the online EGART model; (iii) Initially EGART learned from a set data samples in offline mode, and then capable to continue learning new knowledge from newly available data (namely IO-EGART model).
Therefore, the objectives of this paper are four-fold that derive solutions for the following scenario: (i) To propose an extended ART-based model for solving regression task. (ii) To apply the Ordering Algorithm that originally designed for FAM to an Extended ART-based model. (iii) To propose a strategy that determine number of cluster center of Ordering Algorithm for solving regression task. (iv) To demonstrate the extended ART-based model that previously learned in batch mode (i.e., learning of a set of data samples, and the data presentation is determined Ordering Algorithm), can continuing performing online learning for newly available data samples.
The proposed EGART model with three operating strategies are tested with five UCI benchmark datasets and two other applications of fire safety engineering and the results are benchmarked against other approaches reported in the literature. The results suggested that all three operating strategies of EGART models achieve similar (if not superior) performance as of other approaches.
The organization of this paper is as follows. Section 2 presents the background of this research. Section 3 presents the detailed algorithms of the EGART model as well as the three different operating strategies. The experimental studies and results comparison are given in Section 4. A summary of VOLUME 7, 2019 this paper and suggestions for further work are presented in Section 5.

II. BACKGROUND
Over the last two decades, neural networks and learning systems have been developed for solving pattern classification problems in various domains. In power engineering, some successful applications using neural network and learning systems have been reported. These include in power load or price forecasting [11], condition monitoring of circulating water systems in power generation plant [12], [13], and prediction of harmful gas emission in power generation plant [14]. In image processing and pattern recognition, neural network and learning systems have been successfully implemented in real world applications such as deep learning for face verification [15], multi-task learning for blind source separation [16], patch-based low rank tensor decomposition algorithm for hyperspectral image compression and reconstruction [17], unsupervised deep learning of stacked convolutional denoising auto-encoder for feature representation [18]. In high speed railway systems, successful application of neural network and learning systems have been reported. These include modelling and parallel control systems [19], descriptor estimator-based incipient fault estimation [20], and deep neural network for mechanical fault diagnosis [21].
In the literature, online or sequential learning is an appealing method to undertake the limitations associated with batch learning. In a dynamic environment, the ability of a learning system to operate in real time while being able to act and react autonomously is important. It has to adjust its own controlling parameters, and even its structure depending on the needs of the dynamic environment. A number of networks with such properties include the Resource Allocation Network (RAN) [22] and its extensions, e.g. Resource Allocation Network with Extended Kalman Filter (RANEKF) [23], Minimum Resource Allocation Network (MRAN) [10], Growing and Pruning RBF (GAPRBF) [24], and Generalized Growing, Pruning RBF (GGAPRBF) [25] and single and two hidden layer(s) Feedforward network [26]. In addition, the Online Sequential Extreme Learning Machine (OSELM) [27] uses the sequential least-squares method to minimize the error function. OSELM works with both additive-sigmoid and RBF hidden neurons. For the additive-sigmoid hidden neurons, the weights and biases for the input are generated randomly and they do not change while in the training phase; the weights for the output are determined analytically. Similarly, for OSELM with the RBF hidden neurons, the input weights, which are centers and widths of the RBF functions, are randomly generated, and the output weights are analytically determined.
The incremental non-iterative learning method in [28] constitutes an online and hyperparameter-free learning for the one-layer feedforward neural network without the concept of hidden layers. The principle is to use a square loss function to measure the error, and it has good competency in handling large scale real time learning problems [29]. Oscar et al. [30] presented an online learning algorithm for a two-layer feedforward neural network. The algorithm consists of a factor that puts the errors committed into numbers for each data sample. Such a method is effective for use both static and dynamic environments. A more recent work in Wang et al. [31] suggested an online reliability time series prediction method by combining a Convolutional Neural Network (CNN) and the Long Short Term Memory (LSTM). The proposed method was used to analyze the historical response time series and throughput time series of the serviceoriented architecture services, and predict the reliability of the service system operation in the near future. Another interesting work on a new spiking neural network architecture also has the online learning capability [32]. It uses multiple learning strategies to learn by itself each input spike pattern. The learning algorithm is able to add a neuron, update the related network parameters, or delete a spike pattern. The algorithm information in the network from both local and global sources, and it performs better than batch-learning methods.
On the other hand, a family of online learning neural networks based on the Adaptive Resonance Theory (ART) has been developed for solving both clustering and classification problems [9], [33], [34]. While ART models are generally useful for solving classification tasks, there are some ART variants that are capable of solving both classification and regression problems [10], [35], [36], along with online learning capability. The following are the properties of online learning as derived from [34]. Similar online learning properties are also given in [7], [34].
Property-1: At any time of the training cycle, only the latest sample is needed for learning instead of all previous training samples.
Property-2: During the training cycle, a training sample is required to be presented only once for learning purposes. No re-iteration through the training set is necessary.
Property-3: Ability to perform incremental learning of new knowledge without corrupting the existing knowledge base.
Property-4: Ability to make a prediction if a new, unlabeled sample is arrived at any time during the training cycle.
Fuzzy ARTMAP (FAM) [35] is one of the most popular ART-based models. It has been pointed out that the performance of FAM is affected by the presentation sequence of training samples [35]. One of the solutions is to train several networks, each of which is given a different presentation sequence of training data, and then to combine the prediction results. However, such an approach demands an overly high computational load. In [10], an ordering algorithm that is based on the Max-Min clustering method to determine a fixed presentation sequence of training data to FAM has been proposed.
Later, GART [36] is developed that incorporates a modified Gaussian Adaptive Resonance Theory (GA) [34] which is a learning model based on a hybrid Gaussian classifier and 116440 VOLUME 7, 2019 ART [33], [9], with the Generalized Regression Neural Network (GRNN) [37] in which it is a memory-based supervised learning neural network. GA learning aims to compress the new training samples into one of the existing categories that are based on PDFs and the Bayesian theorem. In the event where every existing category is not able to accommodate a new sample, a new category is created to encode the new sample. On the other hand, the GRNN learning process is conducted online. It only needs to perform a one-pass learning of the training data, and its learning process is instantaneous. Every hidden neuron in the GRNN function operates as a kernel. Upon receiving a new input sample, the kernels are used to calculate the probability density function (PDF). In the event that a new input sample is one that needs to be learned, the GRNN creates a new neuron to represent the new sample. This hybrid model is an improved version of GRNN, and at the same time preserves the online learning properties of ART. It is capable of conducting unsupervised clustering for the compression of the training data samples into a few hidden categories (or hidden node). And then decision function is constituted based on an enhanced Generalized Regression Neural Network (GRNN) for prediction. GART not only handles classification task, it manages to perform regression problem with the capability of online learning.   layer. All the training samples are stored in the pattern layer. Fig. 2 shows the general GA structure. Category-j of GA uses M -dimensional vectors µ j and σ j to represent its center (mean) and its standard deviation, respectively. A scalar n j is used as its count of the number of training samples categorized into category-j. During the training phase, an Mdimension vector A k (the k th -training sample) is presented to GA for unsupervised learning.
In this paper, instead of using the ordering algorithm with ART-based models for solving classification problems, we propose a new Enhanced Generalized Adaptive Resonance Theory (EGART)-based model, based on a hybrid ART and the General Regression Neural Network (GRNN) [37], coupled a simplified ordering algorithm for tackling regression problems.

III. THE PROPOSED ENHANCED GENERALIZED ADAPTIVE RESONANCE THEORY (EGART) AND THE THREE OPERATING STRATEGIES
This section presents the detailed algorithms of the proposed EGART model. Based on the foundation of GART in [36], we further enhance it with four main improvements to become EGART, in order to increase the robustness of the resulting model in handling regression problems. Firstly, the Laplacian loss function is used in place of the ε-insensitive loss function. Secondly, instead of a general exponential kernel function, a Laplacian likelihood function (Equation 4) is instead adopted. Thirdly, a new definition of the vigilance function during the competition of ART is provided. Fourthly, after the ART competition, a match tracking mechanism is added.
There are three operating strategies of the proposed model, i.e., (A) Enhanced GART with online learning (hereafter denoted as EGART); (B) EGART with data preprocessing by ordering algorithm for the best order (hereafter denoted as Ordered-EGART); (C) Initial training by chuck of data samples using Ordered-EGART followed by handling of new data VOLUME 7, 2019 samples by online learning with EGART (hereafter denoted as IO-EGART).
A. EGART Fig. 3 shows the EGART structure for training and prediction, while the core steps of the EGART learning algorithm are described, as follows.
Training samples -Let the training samples presented to ART-a and ART-b be {(A 1 , B 1 ), (A 2 , B 2 ), . . . , (A k , B k )}, where A k ∈ R M and B k ∈ R 1 are the input vector and kernel label of the k th training sample, respectively. In the following discussion, the equations and variables are based on the ARTa module with input sample A k . The equations and variables of ART-b with input kernel label B k are the same, but with subscript/superscript of ''b'' instead of ''a''.
Competition -The input sample is presented to ART-a, with its kernel label to ART-b, for computation of the choice and match functions. The choice and match functions are defined based on the Bayesian theorem. The Bayesian posterior probability for category-j of ART-a to input sample A k is The prior probability is (2) and where P(A k |j) is a Laplacian likelihood function that is used to measure the degree of similarity between A k and category-j, and is defined as where µ a j , σ a j , and n a j are the center, standard deviation, count of category -j of ART-a, respectively. While GA [34] uses a standard quadratic loss function, EGART uses the Laplacian loss function. In the presence of noisy data, outliers have a high chance of presence in the quadratic loss function, which causes inaccurate recognition [38].
To find the ''first round winner'' of the competition, two measures are needed, i.e., the choice function of ART-a as 116442 VOLUME 7, 2019 defined in (1) and the match function, as follows The first-round winner of ART-a, which is noted as J , is selected based on the choice function with the highest value. Its match function must be larger than or equal to the vigilance parameter, i.e., where ρ a is a pre-defined vigilance parameter between 0 and 1.
Match Tracking -The winner needs to be verified before it can be declared as the ''final winner'', which is carried out with the map-field vigilance test, i.e., where ρ b is a pre-defined vigilance parameter between 0 and 1. If the conditions set in (7) is fulfilled, category-J is declared as the final winner, and is ready for weight adaptation, i.e., learning. If not, a search based on (6) from the existing categories for a better candidate for the first-round winner is conducted, i.e., the match-tracking mechanism is triggered. During match tracking, the initial first-round winner, which has failed to fulfill (7), is deactivated temporarily by setting its choice function to be a negative value, i.e.,P(j|A k ) = −1, and ρ a is temporarily increased to ρ a = V (A k , J ).
Learning -Learning involves the adjustment of the center, standard deviation, and counts of the final winner by using the following equations.
Addition of a new category-For the case where none of the existing categories that fulfills (6) is able to pass the map-field vigilance test (7), a new category is created to represent the new sample, i.e., where γ a is a pre-defined initial standard deviation. During the prediction cycle, a, new, unlabeled sample, x, is firstly presented to ART-a. The prediction of EGART is obtained by a distributed posterior probability estimation based on the GRNN algorithm, as follows,

B. ORDERED-EGART
The applicability and performance of ART-based neural network has been reported [9], [33], [34]. As an online learning algorithm, the performance maybe disturbed by the sequence or the order of training data presentation [10]. For managing this situation, an Ordering Algorithm for FAM has been proposed [10]. Instead of random order in online learning, ordering algorithm a preprocessing method to compute the best order of training data presentation to the collected data samples. However, such algorithm required a set of many data samples to be collected and involved preprocessing. The ordering algorithm is a type of min-max clustering algorithm that is used to find the presentation order of training data for FAM [10]. This algorithm requires a pre-defined number of cluster centers, which poses an obstacle to satisfying the online learning properties. One way to solve this problem is to define this number to be equal to one larger than the number of classes [10]. Therefore, when FAM employs ordering algorithm, it will become offline learning. Instead of solving classification task using FAM with ordering algorithm, this paper proposes a new EGART model with implementation of ordering algorithm -to solve data regression problem. At the same time, we also propose a new ordering strategy for regression, i.e., the ordering algorithm is simplified to only one cluster center and applied in data regression task since there is no class information involved in regression. For training of EGART with ordering algorithm (Ordered-EGART), as there is only one cluster center hence the original three-stage procedure of the ordering algorithm [10] is now simplified to only two stages, as follows: Stage 1-Identify the cluster center (the first training sample in the sequence): For each of the M -dimension input vectors,A k , find the respective complement-coded [33] vector, I k ∈ R 2M i.e., The k th input vector that has the largest value according to Equation (12) is selected as the first sample in the presentation sequence.
Stage 2-The presentation sequence of the remaining input vectors is determined based on the minimum Euclidean distance from the input vectors to the cluster center.
Once the order of presentation of data is decided by ordering algorithm, the ordered data is send to EGART for training. Under this scheme, it is referred to as Ordered-EGART as shown in Fig. 4.

C. IO-EGART
In most of the real world applications with nonstationary data samples, a set of training input and respective targeted output   (A, B) may be available before training of a EGART. In this scenario, ordering algorithm is applied to the set of available training data samples before the training of EGART (i.e., Ordered-EGART) for best performance. As this is a nonstationary application, after the training of Ordered-EGART is completed, more new training data (A', B') may be available since then. In order to ensure the trained EGART (by Ordered-EGART) can continue to learn new training data, the online learning algorithm can be applied to the trained EGART that has been trained by Ordered-EGART strategy. This operating strategy is thereby named as Initial-Ordered-EGART (IO-EGART) as shown in Fig. 5.

IV. EXPERIMENTS
Three versions of EGART are evaluated in this paper. Firstly, EGART is applied with no pre-collected training data. In other words, EGART is expected to learn from the very first training sample. Secondly, the simplified ordering algorithm is combined with EGART (Ordered-EGART). For this model, a collection of training samples is available, and the problem undertaken is a stationary one. Note that by using the simplified ordering algorithm to determine the presentation sequence of training data, Ordered-EGART violates Property-1 of online learning. The advantage is that Ordered-EGART is expected to exhibit a lower error rate and a smaller network size in line with the theory advocated by ordering algorithm [26] as compared with those of EGART. Another advantage ensues by using the simplified ordering algorithm is that the training of the Ordered-EGART will become more straightforward, thus resulting in shorter training time. Thirdly, the concept of an ''initial training'' for EGART with the simplified ordering algorithm is proposed (IO-EGART). For this model, it is assumed that some data samples are available for training, and the simplified ordering algorithm is used to determine the presentation sequence of these data samples. After the initial training, the trained IO-EGART model is ready to perform online learning. This means that the simplified ordering algorithm is not used to process subsequent new data samples that are available after the initial training. Note that Ordered-EGART can be extended to be IO-EGART if the new data samples are presented one-by-one after the initial training without further ordered.
In this paper, all three EGART-based models (i.e., EGART, Ordered-EGART, IO-EGART) are evaluated using five regression data sets from various application domains. The Ordered-EGART is expected to produce the best prediction accuracy rates with the simplest network structures; but without the properties of online learning. On the other hand, EGART is a fully online learning model, but may perform with lower prediction accuracy rates with more complex network structures. Finally, IO-EGART is expected to be a compromise between EGART and Ordered-EGART, which preserves the online learning properties.
Seven regression data sets are used for evaluation, i.e., SinE (artificial mapping data), Delta Ailerons (aircraft control), Boston Housing (economy), Sante-Fe Series-E (Astrophysical data), Abalone (Physical measurement), Thermal Interface Location (fire safety engineering) and Fire Evacuation Time (Building evacuation exercise). The same training data samples are used for training the three EGART-based models, the difference lies in the training operating strategies, in which EGART will handle all training samples online, Ordered-EGART will consider all training samples offline with ordering algorithm in place, and IO-EGART will take half of the training samples for preprocessing with ordering algorithm while the remaining half of the training samples will be presented and learned one by one online. Besides that, the test procedure is also based on the same set of test data samples. The Root Mean Square Error (RMSE) is used as the performance indicator. The training and test cycles are repeated 50 times. The average number of categories (nodes) created over 50 runs is reported. The final error rate is obtained by using the bootstrap method [30], [31], i.e., the bootstrap mean with 1,000 re-samplings.
Bootstrapping is a technique that roughly calculate the change of a certain criterion statistically in certain situations whereby the concealed sampling distribution of the criterion is not known and/or rather demanding to estimate. It is handy for computing the statistics of population parameters in situations with small or limited data samples [30]. The principles of bootstrapping for the calculation of the mean of a set of data samples is as following steps: 1. A data set is first acquired. Assuming that data set X = x 1 , . . . , x n is acquired, and n is the sample size that is examined from a totally undefined distribution of probability F, ψ is the average of all the values in the data set X and is the total number of times bootstrapping is repeated. 2. Select a random sample of n data points with replacements from the data set X. This new set of data, X * , is the bootstrap sample. 3. The mean of the bootstrap sample X * ,ψ * 1 is calculated. Re-sample the data by repeating steps 2 to 3 N times to get the bootstrap estimates of ψ * 1 , ψ * 2 , ψ * 3 . . . , ψ * N . For all experiments, parameters ρ a and γ b were set to their default values, i.e., 0 and 1, respectively. Parameter γ a was obtained after several trials, and parameter ρ b was varied from  0.5 to 0.95 (while other parameters fixed) in order to evaluate the effectiveness (network size and error rate) of the three EGART-based models.

A. SINE
In this experiment, the three EGART-based models were used to estimate a fast changing continuous function known as ''SinE'' [24], [36], i.e., A total of 50 runs were carried out in total, each of which contained 3,000 training samples. The 3,000 training samples were composed of randomly generated x values between 0 and 10, with their respective y values. The test samples were produced by utilizing similar procedures, and a total of 1,500 samples were created. Table 1 shows the results of three EGART-based models for various values of ρ b . Comparing with EGART, IO-EGART and Ordered-EGART created smaller network sizes (from 2.52% to 24.98%, and 13.67% to 51.84%, for various values of ρ b ) and smaller error rates (from 29.11% to 53.85%, and 42.86% to 62.01% for various values of ρ b ). Table 2 shows the best results and standard deviations of the three VOLUME 7, 2019 EGART-based models, as well as the results of other methods in [24], [36]. All three EGART-based models achieved smaller error rates as compared with those in [24], [36]. Their error rates reduced by 55.76 % (EGART), 70.63% (IO-EGART), and 74.72% (Ordered-EGART) than that of GAPRBF (the best in [24], [36]).

B. DELTA AILERONS
This set of data was concerned with the control of ailerons on an F16 aircraft [39]. A total of 7,192 samples were available. Each pair of data sample consisted of five continuous variables (roll rate, pitch rate, current pitch, current roll, and difference of roll rate) and its respective label (control action).
According to [26], [39]- [41], 3,000 of the 7,192 samples were used as the training set, and the remaining as the test set. The input and output variables were set between 0 and 1, while the RMSE was derived based on the same range. Table 3 shows the results (RMSE and number of categories) of the three EGART-based models for various ρ b . Again, IO-EGART and Ordered-EGART established smaller network sizes (from 1.88% to 12.88% and 10.62% to 38.66 % for various ρ b ) and produced slightly lower error rates than those of EGART. Table 4 shows the best results and standard deviations of three EGART-based models. All three EGART models achieved lower error rates as compared with those reported in [26], [39]- [41].

C. BOSTON HOUSING
This data set was concerned with predicting the housing values in Boston. There were 506 samples. Each sample consisted of 13 input attributes (12 of which are continuous and one binary attribute), and one continuous target output. According to [24], 481 of the samples were selected randomly as the training set, and the remaining as the test set. Again, all input and output variables were normalised  between 0 and 1, while the RMSE was derived based on the same range. Table 5 below depicts the results (RMSE and number of categories) of three EGART-based models for various ρ b . Comparing with EGART, the best improvements of Ordered-EGART are 15.15% (network size) and 9.32% (prediction error rate). Table 6 shows the best results and standard deviations of three EGART-based models, and a comparison with those of other methods in [24]. All three EGART-based models achieved significantly better rates of errors, for example, the rate of errors for Ordered-EGART was 26.77% and 71.07% lower as compared with those of MRAN and RAN [24], respectively.

D. SANTA-FE SERIES-E
This problem was concerned with the Sante-Fe Time Series Competition, specifically the Series-E [43]. The astrophysical data (univariate time series data with only one variable) was noisy, discontinuous, and nonlinear. In accordance 116446 VOLUME 7, 2019  with [42], [44], a total of 2048 of the samples were taken into account, each with five inputs and one output, i.e., , with x t being the intensity of the star at time, t. The first 90% of the data samples were used as the training set, and the remaining were used as the test set. Table 7 shows the results (MSE and number of categories) of three EGART-based models for various ρ b . Comparing with EGART, the best improvements of Ordered-EGART are 24.80 % (network size) and 1.86% (prediction error rate). Table 8 shows the best results of three EGART-based models, and a comparison with those of other methods in [42], [44]. The error rates of three EGART-based models are slightly better than that of GRNNFA [34], but better than those reported in [44].

E. ABALONE
The abalone data set [26] consists of 4177 samples for predicting the age of abalone based on physical measurements. Each samples comprises eight continuous input attributes (physical measurements) and one integer output (age of abalone). In each run, 2784 training samples and 1393 testing  samples were randomly selected from the data set. Table 9 shows the results (RMSE and number of categories) of three EGART-based models for various values of ρ b . For this data set, comparing with EGART, the best improvements of Ordered-EGART have been improved by 20.35 % (network size) and 2.60% (prediction error rate). Table 10 shows the best results of three EGART-based models, and a comparison with Support Vector Regression (SVR), Least Square Support Vector Regression (LSSVR), Extreme Learning Machine (ELM) and Online Sequential Extreme Learning Machine (OSELM). The best model with the smallest error is LSSVM, then all three EGART-based model has smaller errors than SVM, ELM and OSELM.

F. THERMAL INTERFACE HEIGHT
The dividing line between cold air and hot gas layers of a compartment fire is called the thermal interface. This height of the interface is affected mainly by the mass of the air that is entraining into the plume of fire. A set of full scale VOLUME 7, 2019 steady-state experiments of flow induced in a single compartment fire was reported in [46], and 55 samples were collected. Each sample consisted of six attributes, i.e., size (height and width) of the sill of the opening, location of the fire bed, strength of the fire source, and ambient temperature, as well as its respective label (i.e. thermal interface height).
In general, a thermal interface's height can be approximated based on the profile of the room temperature along the room height. The thermal interface's height is defined as the height of the profile with the largest change of room temperature. However, due to the effects of mixing and diffusion of fluids, the height of the thermal interface cannot be stably measured. It is possible for the thermal interface heights to be bound between an error envelope of ±8% to ±50% from the estimated height [46]. Therefore, for each sample, an error value was included (in addition to the label) to form the envelope of thermal interface height.
Since the set of data contained only 55 samples, the leaveone-out (LOO) method was utilized. In each LOO run, 54 of the 55 samples were used as the training set and the remaining one was used as the testing set. Note that other than RMSE and number of categories, the prediction accuracy (sum of predictions within the acceptable envelope range) was included. Table 11 shows the results (RMSE, accuracy rate, and number of categories) of the three EGART-based models for various values of ρ b . Fig. 4 shows the prediction of Ordered-EGART as compared with the envelope range of thermal interface heights. Table 12 shows the best results of EGART and Ordered-EGART as compared with those of [47], Extreme Learning Machine (ELM) [40] and Online Sequential Extreme Learning Machine [27]. The ELM is an offline learning algorithm that similar to Ordered-EGART. On one hand, OSELM is an extension of ELM that requires some data samples for initial training before online learning which can be deemed having the similar learning strategy as of IO-EGART. The performances of the models (based on the prediction within the range envelope and prediction fallen out of range envelope) are recorded in terms of accuracy. The results appear consistent, i.e., 94.55% for offline algorithms  that encompass Ordered-EGART, ELM and GRNNFA, while 89.09% when they are trained online that includes EGART, IO-EGART, and OSELM) The result is rather close because the size of data samples of this dataset is rather small and Leave One Out (LOO) strategy is used.
Although Ordered-EGART and GRNNFA achieved the same accuracy rate (94.55%), Ordered-EGART used only one LOO training cycle, since the simplified ordering algorithm was used to ascertain the presentation sequence of training samples. However, GRNNFA applied 20 LOO cycles [37] with different presentation sequences of training samples before obtaining the final average results, as in Table 10. Therefore, Ordered-EGART has the advantage of simplicity and of a faster processing time (with only one LOO cycle).

G. FIRE EVACUATION TIME
In this experiment, the EGART models were applied to building evacuation. The study focused on predicting the time taken for people to evacuate in the event of fire emergency in a typical karaoke center in Hong Kong. The evacuation time 116448 VOLUME 7, 2019 FIGURE 6. Prediction of Ordered-EGART, IO-EGART and EGART for the thermal interface height problem (sorted ascending). For results of Ordered-EGART, out of 55 samples, 52 (or 94.55%) have been successfully predicted to be within the envelope. is distinctly linked to normal human behaviors like the route choice and human behavioral responses to the fire alarm. Fig.7 depicts a common floor plan or layout of a typical karaoke center in Hong Kong.
In a standard karaoke center in Hong Kong, a straight corridor to which the rooms are connected is directly connected to a lobby area with a clear exit. It is intriguing to perform investigations on the relationship between the variables of the layout of the karaoke center, and the evacuation time. The evacuation time is defined as the time taken for everyone to evacuate the center starting from the time the fire alarm sounds.
For this case study, there were three layout parameters or variables, i.e., the number of rooms (ranging from 4 to 22 rooms), width of the corridor (ranging from 1.05 meters to 1.4 meters), and the area of the lobby (ranging from 15 square-meters to 35 square-meters). The variables were varied in order to examine their effects on the evacuation time. This evacuation time was determined by using the Spatial Grid Evacuation Model [48]. There were a total of 750 data VOLUME 7, 2019  samples for experimentation, with 550 training samples and 200 test samples. Table 13 shows the results (RMSE and number of categories) of three EGART-based models for various values of ρ b . Comparing with EGART, the best improvements of Ordered-EGART are 18.30 % (network size) and 3.27% (prediction error rate). Table 14 depicts the best results of three EGART-based models, and a comparison with ELM, OSELM, and GRNN. The Ordered-EGART achieved smallest error rates as compared with other models.

V. SUMMARY
In this paper, an enhanced GART neural network, known as EGART, with a simplified ordering algorithm for determining the presentation sequence of training data has been proposed. Three operating strategies of EGART, i.e. fully online (EGART), semi-online (IO-EGART), and off-line (Ordered EGART) operations, have been suggested. The effectiveness of the three EGART-based models in tackling regression problems have been demonstrated using four benchmark problems (SinE, Delta Ailerons, Boston Housing, and Sante-Fe Series-E). In additional, the three EGART-based models have been applied to fire safety engineering, i.e., to prediction of the thermal interface height in single compartment fire and fire evacuation times. All the experimental results have been quantified by the bootstrap technique. The results positively demonstrate that EGART, IO-EGART, and Ordered-EGART perform better than other machine learning methods in tackling regression problems.
Further research is focused on improving the EGART-based models. A pruning algorithm for the EGART-based models is needed to reduce the network complexity in the case when too many redundant categories are generated during the training cycle. On the other hand, Ordered-EGART can be used for designing an ensemble of neural networks for handling regression problems. The training samples can be divided into several blocks, and the simplified ordering algorithm is applied before presenting the samples to Ordered-EGART for learning. A cooperative multiple Ordered-EGART regression model can also be investigated to provide better solutions when a single estimator fails to solve the underlying problem.
CHEE PENG LIM received the Ph.D. degree in intelligent systems from The University of Sheffield, Sheffield, U.K., in 1997. He is currently an Associate Professor with the Institute for Intelligent Systems Research and Innovation, Deakin University, Geelong, VIC, Australia. He has authored more than 350 technical papers in journals, conference proceedings, and books, and edited three books and 12 special issues in journals. His research interests include computational intelligence-based systems for data analytics, condition monitoring, optimization, and decision support. He served on the Editorial Board for five international journals. He received seven best paper awards.
ERIC WAI MING LEE received the B.Eng. degree (Hons.) in building services engineering and the Ph.D. degree in fire engineering from the City University of Hong Kong. He is a Professional Engineer and currently a Panel member of the Fire Discipline of the Hong Kong Institution of Engineers, the Assessor of the Hong Kong Laboratory Accreditation Scheme of the HKSAR Government, and a member of various technical committees of the Buildings Department of the HKSAR Government. 116452 VOLUME 7, 2019