Development of a Hybrid Machine Learning Model for Asphalt Pavement Temperature Prediction

Machine learning (ML) models are excellent alternative solutions to model complex engineering issues with high reliability and accuracy. This paper presents two extensively explored ensemble models for predicting asphalt pavement temperature, the Markov chain Monte Carlo (MCMC) and random forest (RF). The RF and multiple MCMC (RF-MCMC) were used to hybridise the proposed algorithms for the optimal prediction of asphalt pavement temperature. This study used thermal instruments to measure the asphalt pavement temperature in Gaza Strip, Palestine. The temperature measurements were made at a two-hour interval from March 2012 to February 2013. The temperature data was used to model the pavement temperature. More than 7200 measured pavement temperatures were used to train and validate the proposed models. The validation showed that the ML models are satisfactory. The modelling results ensured the value of the proposed hybridisation models in predicting the asphalt pavement temperature levels. The developed hybrid algorithms regression model achieved acceptable and better prediction results with a coefficient of determination (R2) of 0.96. Generally, the results confirmed the significance of the proposed hybrid model as a reliable alternative computer-aided model for predicting asphalt pavement temperature.


A. RESEARCH BACKGROUND
Asphalt flexible pavements are the critical component of transport and highways and are more widely used than other materials because of their superior service performance [1][2][3]. The basis of flexible paving design is climatic conditions and axle load limit [4]. Ninety per cent of the pavements constructed throughout the world use asphalt binders [5].
The design of asphalt pavements is dependent on the structural capacity of the pavement layers, which is influenced by the applied stress and recoverable strain in the resilient modulus test, which involves different temperature ranges, loads, rest periods and loading axes [6,7]. However, air temperature could cause severe damage to asphalt pavements and adversely affect pavement performance [8]. Therefore, it is essential to consider air temperature when designing flexible pavements. Hence, it leads to the asphalt binder changes from Newtonian to a viscoelastic fluid to a viscoelastic solid [9][10][11]. The cumulative and consumption conditions of asphalt pavements lead to a wastage of structural and practical characteristics that reduce serviceability [12,13].
The mechanistic-empirical design provides a more asphalt pavement approach to most of the materials used to measure the performance of the asphalt pavement, and it is used to determine the parameters affecting the asphalt pavement during different seasons [14][15][16][17]. An appropriate asphalt pavement temperature model is critical to ensure the safety and comfort of road users and extend the service life of road pavements [18,19]. Over the past few decades, researchers have developed many regression models to predict asphalt pavement temperatures, and some of these models have good accuracy. However, they require several input parameters for each prediction equation with their strengths and weaknesses [20,21]. The accuracy of a statistical method can only be established within the range of the original data used to develop the MLR models [22]. With technological advances, engineers now use computers and software to analyse data [22][23][24].

B. LITERATURE REVIEW
Barber was one of the first researchers to develop a thermal conductivity model for calculating the internal temperature profile of asphalt pavements [25]. In 1987, the Strategic Highway Research Program (SHRP) introduced the Long-Term Pavement Performance Program (LTTP), a 20-year research program to improve pavement characterisation at selected sites of the Seasonal Monitoring Program (SMP) [27][28][29]. In 1993, Lytton et al. developed an enhanced integrated climatic model (EICM) to predict the heat of pavement structures caused by climate parameter change [26]. And in the same year, researchers used the heat transfer theory to develop and validate the model for the summer using the highest pavement surface temperature [30]. Matic et al. developed a model to predict pavement temperature at particular depths using air temperature and other climate parameters [29]. In 2015, Islam et al. developed a series of statistical models for predicting the minimum, maximum and average pavement temperatures using the field measurement data from an instrumented road in New Mexico, USA [31]. Researchers installed two weather stations at a project site in Ashtabula, Ohio, USA, to collect climate data and used the data to develop regression models to predict the maximum and minimum daily temperatures [32]. Khan et al. developed and validated a regression profile to determine the average temperature in asphalt pavement layers. The temperature data is from a road segment near Albuquerque, New Mexico, USA, and the average depth for measuring the mean temperature is 2.54 inches [33].
In 2018, Li et al. developed and validated a statistical model for predicting the temperature at depths of more than 30 cm located in Alberta, Canada. The hourly pavement temperature measurement and meteorological data collection were made at three selected sites. The model was validated, and its applicability was verified using the available data from other locations [35]. In recent years, researchers have shown increasing interest in using artificial intelligence techniques, for example, machine learning (ML) and deep learning, to analyse and predict pavement temperature parameters [4,36]. In 2014, a study in Serbia used artificial neural networks to develop models for predicting the minimum and maximum pavement temperatures based on the pavement's surface temperature and depth [23]. The existing hybrid models for predicting the asphalt pavement's temperature consider the impact of environmental factors on asphalt pavement. However, these models do not consider the reflect impact from the heated pavement that contribute to the heat accumulation in the surrounding environment, such as that in urban heat island (UHI) effects. Therefore, it is essential to conduct studies that take this factor into account.

C. MOTIVATION FOR THE RESEARCH
In the last decade, researchers used ML techniques to develop pavement temperature prediction models that use parameters such as solar radiation and air temperature. Their research was limited to the application of evolutionary algorithms such as gradient boosting regression tree, backpropagation (BP) neural network, Gaussian process regression (GPR) models, support vector machine (SVM) regression trees, ensembles of trees, linear regression models, and random forest [37]- [42]. There is an urgent need to develop a hybrid model to address these gaps. Specifically, the researchers proposed different ways to set up the RF parameters in the hybrid RF and MCMC models (too many trees can considerably slow down an algorithm). Even though these approaches could enhance the effectiveness of MCMC in various applications, the methods also particularly manifest in terms of RF parameter selection. The algorithm models have unique main aspects, strengths and weaknesses. The idea of merging these concepts holistically to combine their advantages has not been extensively explored in the literature.

D. RESEARCH OBJECTIVES
Based on the review of the reported literature, limited studies were conducted using machine learning to achieve better accuracy in predicting asphalt pavement temperature. The proposed model was built using ML and the field data from selected sites in the Gaza Strip, Palestine. The hybrid model was used to predict the asphalt temperature at varying depths. This study used the hybrid ML algorithms of random forest (RF) and Markov chain Monte Carlo (MCMC) regression to predict asphalt pavement temperature. Three input sets of correlation statistics were tested for this model to improve the influential parameters. At the same time, the validations were compared with other models, such as artificial intelligence and empirical formulations. The remainder of this paper is organised as follows. Section 2 explains the observation location and data acquisition, Section 3 discusses the applied ML models, Section 4 presents the results and discussion, and Section 5 compares the result of this study with a previous study.

II. OBSERVATION LOCATION AND DATA ACQUISITION
The location for measuring pavement temperature is in Gaza, Palestine. An observation station was set up to measure pavement and air temperature at varying depths of 0 cm, 2 cm, 5.5 cm, and 7 cm at different time intervals during the four seasons of one year as shown in Figure 1. The location of the study area is shown in Figure 2, where manual measurements were performed to collect the data from March 2012 to February 2013 in the Gaza Strip, Palestine. The locations were pre-determined, and the condition at the data collection sites are well-known [35]. The data from the measurement station comprises three independent variables (air temperature, time, and depth) and one dependent variable (asphalt pavement temperature).

III. APPLIED MACHINE LEARNING MODELS
This study predicts asphalt temperature ( ) using air temperature ( 1 ), time of the day ( 2 ), and depth of measurement ( 3 ). The data comprises the target data and input data in a machine learning problem, where is the target data and = [ 1 2 3 ] is the input data. The linear regression formula for the data is given by Equation (1). where is the data point of , and 1 , 2 , and 3 are the ℎ data points of air temperature, time of the day, and depth. Therefore, 0 , 1 , 2 , and 3 are the unknowns in the equation, also known as regression coefficients. The regression coefficients are calculated using the MCMC method, which is explained in the following subsection.

A. MARKOV CHAIN MONTE CARLO
According to a survey, the Metropolis algorithm is among the ten algorithms that effectively influenced the progress of science and engineering in the last century [43], [44]. However, the MCMC algorithm is an example of a large sampling algorithm class with a significant part of computing science and statistics. This technique draws samples, and the following sample is based on the existing sample sequences of probabilistically related events to result in a solution within a reasonable period [45], [46]. This type of ML has the advantage of making reversible jumps because of its flexibility to include many possible moves based on the selection strategy compared to other schemes that rely on the indicator mix selections or diffusion processes [47]. The reversible jump is a tricky and timeconsuming mechanism of this technique for reversible engineering actions. In addition, this technique is often used for integration or optimisation problems in bulky space dimensions. The MCMC is a class of algorithms used for sampling an unknown distribution when the set of observations is given [48]. The Metropolis-Hastings algorithm was chosen because of its suitability for multidimensional sampling problems [49], [50]. This application requires a fourdimensional sample, as shown in Figure 3. When using MCMC to solve regression problems, the regression coefficients are guess values (or random values). The algorithm is iterated using random processes and the given distribution, known as a transition function. Here, the Gaussian normal distribution is given by Equation (2) [49].
In each iteration, the transition function generates new regression coefficients with an optimal variance . However, a significant variance can cause a miss of maxima; a very small variance could increase the number of iterations, and the algorithm may be trapped at the local maxima. This study chose = 0.0003 based on the chain performance and = . This transition was achieved using a normal distribution random number generator. Hence, the transition is expressed as = ( , ). The next step in the Metropolis-Hastings algorithm is deciding whether to accept or reject the new regression coefficient values generated by the transition function. For this purpose, the ratio of posterior probabilities is calculated and compared with a random number. If the ratio of posteriors is greater than the random number, the values of are accepted. Otherwise, the values are rejected. This process is expressed using Equation (3) [49].
where is the acceptance ratio and function is the posterior. The probability of acceptance is defined as ( ) = min (1, ); this constrains the value of the acceptance probability between 0 and 1. A random number from a uniform distribution, U, is generated. If the random number  Figure 3 shows the generalised working of MCMC. Figure 4 shows the flowchart for implementing the Metropolis-Hastings algorithm. Furthermore, in order to decide on the acceptance or rejection of the proposed regression coefficients, there is a need for some sample distribution that roughly can be considered a posterior. If one follows the Bayes formula, the posterior can be defined using Equation (4) [50].
The denominator in Equation (4) is the normalising constant and cannot be evaluated analytically because the posterior is unknown. Therefore, the term is ignored when calculating the approximate posterior for the acceptance ratio. Equation (4) can be written as Posterior Likelihood × Prior; taking the ratio of two posteriors gives an approximate posterior ratio. The prior and likelihood can be taken as some simple distribution from which the samples can be obtained. This study took the normal distribution as the likelihood by representing as a precision parameter for simplicity, as given in Equation (5).
where = 1 and = 1 are the parameters for the gamma distribution; is the standard deviation for each value of , which is updated based on the update in each iteration during the training. Hence, by using the substitutes for the likelihood and prior from Equations (5) and (6), the posterior can be written as Equation (7).

B. RANDOM FOREST
Ho introduced the random forest in 1995, and in 2001, Beriman further developed this algorithm [51]. RF is a computationally efficient technique for large data. Many research projects and real-world applications have successfully used RF in various fields [52]- [55]. The RF algorithm is excellent for learning methods using classification and regression trees (CART). This algorithm is dependent on the bootstrap resampling algorithm in a multiple sample extraction of the basic data; in short, it is based on the statistical learning theory [56], [57]. The predictions of each decision treeis combined into multiple decision trees and averaged to obtain the best prediction output. However, increasing the diversity of the decision trees by randomly repeating the samples changes its predicted combination evolutionary for each tree decision. Because this study focused on predicting regression, the priority was introduced to a regression tree (RT). The leaf node implies that the samples were formulated using the mean square error (MSE) for each sample and regression tree branch. However, each minimum leaf node MSE was a branching condition executed till no more features were available or the optimal total MSE was satisfied, which was the end of the regression tree progress. It has two custom parameters, the number of RTs and random variables, such as the number of nodes known as Max depth [52]. Then, the data processing-related errors allowed only the minimisation to ensure the optimised parameters. The modelling of the RF algorithms used the bootstrap samplingbased extraction of N estimator training sets of the data. The training set is made up of about two-thirds of the original dataset. However, one-third of the dataset was not a part of each RF, and the bootstrap sample data were not plotted as a graph in the training progress [36]. This portion of the dataset is the untrained data. After that, the generating RT to every bootstrap training set, next by a "forest" of constructing a sum of N estimators' RT which is not involved. The growth of each tree did not pick every optimal attribute, such as an internal node to each branching; in contrast, the selection of an optimal attribute was a random process based on the max depth attributes. The RF algorithm then created several training sets to enhance the differences between the regression models and improve the extrapolation prediction ability of the combined regression models. The regression model sequence 1 ( ), 2 ( ), … , ( ) was derived using n-time model training; a sequence was used to build the multiple regression model system known as a forest. However, the prediction outcomes of the N estimator RTs were pooled and averaged to calculate the value of the updated sample in the search space. Figure 5 shows the RF model architecture. In this study, the task is a regression because the target data is asphalt temperature, which is continuous data. The term 'Random Forest' in this paper indicates 'Regression Forest' henceforth. RF can be defined as a 'collection of tree structure classifiers/regressors ℎ( , Θ ), = 1, . . .., where Θ is independent identically distributed random vectors. Each tree casts a unit vote for the most popular class at input x in the classification task, and the mean of the output of all the trees is taken in the regression task, as shown in Figure 5 [58]. The RF implementation in this study was adapted from Criminisi et al. The output of the forest is the average of all the trees in Equation (8) [55].
where ( | ) is the posterior average of all forests, and is the number of trees. The posterior of each individual was obtained by training the tree for an objective function, which is to maximise the information gain, as given by Equation (9).
where Λ is a conditional covariance matrix, is the training data available at node , and and are the left and right node splits of the data, respectively. However, the output in this dataset is a single variable array, and the information is expressed as Equation (10). VOLUME XX, 2017 where is the estimated output, and is the mean of the training sample at the ℎ node. In this maximum information gain process, the data arriving at node were split into its right or left child. The data split was governed by the decisionmaking component of the tree called 'weak learner' given by Equation (11).
where ( ) represents the feature space of the data, is the geometric shape representing the data separation entity, and 1 and 2 are the constraining variables trained in each iteration. In this study, the data separators are vertically aligned to the y-axis. These axis-aligned separators are called 'Stumps' [59]. Decision stumps separate the data showing the relationship between each input parameter and the asphalt temperature. The decision stumps for time and depth are simple and directly defined because of their discrete nature. However, for the relationship between the air temperature and asphalt temperature, the decision stumps have to be calculated separately in each training iteration depending upon the difference between the two consecutive values to be separated.

C. Hybrid Algorithm: RF-Based Multiple MCMC Regression (RF-MCMC)
This subsection describes the structure and development of the proposed hybrid algorithm. The hybrid algorithm (RF-MCMC) regression proposed in this study split the data into the best possible number of discrete sets and labelled them. Figures 6, 7 and 8 show the relationships between asphalt temperature and air temperature, time of the day, and depth.
These figures show that the data did not fit well with the predicted data when using a single linear regression model consisting of four regression coefficients given by Equation (1). Figure 7 shows that the time consists of ten finite sets of time, and Figure 8 shows that the depth consists of four finite sets of depths. Therefore, the data has 40 labelled discrete sets, which increase the total number of correlation coefficients from 4 to 160 (4 × 40). RF is used to learn the labels and the MCMC regression for each dataset combined with the labels. VOLUME XX, 2017 The training of the algorithm was carried out as follows.

1) Clustering the Data Using the K-Means Algorithm
The k-means clustering algorithm can be explained in the following four steps [59].
i. Randomly choose the initial j clusters C = c1; :::::::; cj . ii. Iterate over the j clusters, and select the closest set of points to each cluster. iii. Set a centroid, ci, for each cluster. iv. Repeat steps ii and iii until there is no cluster change.

2) Train the Random Forest to Learn the Cluster of the Training Data
The target data in this part of RF is the training labels generated by the k-means clustering. However, as described earlier, there is a separate MCMC regression for each dataset.
To know which regression coefficients belonged to which dataset, it is essential to label the regression coefficients with the dataset. This was done using the RF, as shown in Figure 9. The RF learned the dataset label and assigned the corresponding label to each set of regression coefficients created using the MCMC algorithms.

3) Train the MCMC for Each Cluster
In Equations (5), (6), and (7), each MCMC posterior distribution depend on four values. However, in the proposed hybrid algorithm, as the data were split into = sets to produce 160 values of , the posterior distribution for all 160 values could be rewritten as follows.
which is the product of all individual posteriors of the dataset given by . Where And ( 0 , 1 , 2 , 3 , )

4) Label the Corresponding Regression Coefficients with the Labels Generated by the RF
The regression coefficient output of this algorithm works for a × matrix, where = and = . In Equation (1), is set as the offset value, is the air temperature regression coefficient, is the time of the day regression coefficient, and is the depth regression coefficient. The selection of from its number of sets is dependent on the label of the data determined by RF, as shown in Equations (12) and (13). Therefore, the data labels generated by the RF were added to the sets of regression coefficients generated by the MCMC. Figure 9 shows the training process for the hybrid algorithm. The testing prediction procedure for the hybrid algorithm can be shown stepwise as a reading test data, which classifies the test data into one of the 40 clusters using the RF classifier, reading the regression coefficients corresponding to the class, and multiplying the regression coefficients. The result is the predicted asphalt temperature.

IV. RESULTS AND DISCUSSION
This section presents the development of the proposed ML models using three input combinations, and the final results are shown in Table 1. The algorithms developed in this study were tested for five random combinations of the test dataset, each with a length of 1500 points. The random forest, MCMC, and proposed RF-MCMC algorithms were tested for the same combination of data. In each trial, the model performance in the testing phase of the prediction was evaluated using four metrics, 2 , , , and . 1) Parameter 2 is the square of the correlation, where an R 2 of 1 means the best prediction and the similarity between the predicted data and the observed data.
Therefore, the parameter is prone to showing a good fit despite the constant offset present in the data [60].
where and are the observed and predicted asphalt temperature, respectively, and is the number of test/observation samples. 4) The Nash-Sutcliffe efficiency (NSE) is used to determine the goodness of fit between the observed and predicted data [63].
The results show that the proposed algorithm gave better results for all evaluated parameters in this study. Table 1 shows that the hybrid algorithm with a combination of 3 has the best performance with 2 = 0.9625, = 0.0378, = 0.0308, and = 0.9605. A graphical representation of the correlation coefficient ( ) and with the standard deviation is shown as the Taylor diagram to intuitively compare the performance of the models. The Taylor diagrams for the five sets of data have four comparison points. The first point is from the original data (observed data) with = 1, = 0, and next is the data points predicted by the different models. The data points are expected to show a similar maximum , minimum , and standard deviation as the observed data. Based on the Taylor diagrams, it is concluded that the hybrid algorithm data point is the closest to the actual data, followed by MCMC and RF. Furthermore, the RF, MCMC, and hybrid algorithms showed significant performance differences.
A boxplot is a graph that shows how the values in the data are distributed. Although boxplots appear rudimentary compared to a histogram or density plot, they have the advantage of taking up less space, which is beneficial for comparing distributions across several groups or datasets. A boxplot may be used to determine whether a distribution is skewed and whether the data collection contains any potentially exceptional observations (outliers). The box is a rectangle with a vertical line inside it at the median. The top and the bottom of the box represent the 75 ℎ and 25 ℎ percentiles (3 and 1 quartiles). The distribution is skewed if the median is not in the centre of the box. It is positively skewed if the median is closer to the bottom and negatively skewed if the median is closer to the top of the box. Extreme values and outliers are frequently indicated with a red '+'. Figure 12 shows the box plot analysis for the RF, MCMC, and hybrid algorithms for the estimated and measured asphalt temperature. In both cases, the hybrid algorithm showed the best performance with none or minimum outlier points. The

V. COMPARISON WITH PREVIOUS STUDY
The literature review showed that several researchers have developed regression models for predicting pavement temperature [4]. Furthermore, some related studies on the regression models were investigated and compared with the proposed method in Table 2. The observations of the presented formulas revealed that there was no general mechanism used by all of the other research. Nonetheless, all models used regression processing to evaluate the pavement temperature on site. However, the validation of the proposed hybrid algorithm models against other regression model results from the practical perspective was essential. Hence, a valid justification could be reached as noted that the objective authentication performance of was the best, considering the other machine learning methods. However, this reduction in the objective function reflected excellent results for asphalt pavement temperature prediction.  Figure 13 summarises the of the hybrid performance against the results published in the literature. The proposed hybrid model showed an excellent prediction ability compared to others. For instance, the RF showed a 95% better accuracy than Liu et al. [37], GPR showed a 99% better accuracy than Nojumi et al. [38], and 96% better accuracy than Asefzadeh et al. [34], and 81.8% better accuracy than GBELM. These percentages proved that the proposed hybrid algorithm model has a promising potential, high reliability and high accuracy for predicting the pavement temperature. This result explains the potential of advanced machine learning (AML) technologies in facilitating engineering applications, reducing human interactions, preventing regression model failure, and providing generalisation capability.

VI. CONCLUSIONS
This study developed a new hybrid RF-MCMC model for predicting the temperature of asphalt pavements in the Gaza Strip. In order to utilise the machine learning application of the RF-MCMC model, measurement and determine the regularity of temperature on the pavement surface to measure the asphalt pavement temperature every two hours at a certain depth, which were 0 , 2 , 5.5 , and 7 , at different time intervals during one year with different variables. Additionally, the static and dynamic prediction methods in the ML methods were introduced to improve the prediction accuracy at the alteration points of the temperature range daily. This study merged two algorithm methods to develop the hybrid RF-MCMC model. These concepts have different main aspects and strengths and weaknesses. However, combining these concepts led to spectacular results. Even though basic machine learning models such as RF and MCMC can make good predictions of asphalt pavement temperature, the performance of the hybrid RF-MCMC is better in terms of the 2 values. The hybrid RF-MCMC algorithm showed the best prediction capability after the hyper-parameter tuning with R 2 values of 0.9249, 0.9384, and 0.9606 for RF, MCMC, and hybrid RF-MCMC, respectively. Moreover, the RMSE, MAE and NSE were used to assess the performance of the proposed model. The results of the hybrid machine learning training and regression and the testing of the two-dimensional Taylor diagram for the hybrid models gave a standard deviation and excellent correlation. Moreover, the reduced error is effective in predicting the optimal patterns of asphalt pavement temperature. Furthermore, the results showed that the boxplot of the estimated and measured asphalt temperature for the validation comparison of the other techniques and the proposed hybrid exhibited magnificent performance.