Toward a Cost-Effective Motorway Traffic State Estimation From Sparse Speed and GPS Data

In this paper, we propose a new data-driven traffic state estimation model that estimates traffic flow based on average speed data only. The model is devised to implement a cost-effective framework that aggregates heterogeneous sources of vehicles’ GPS and speed measurements to infer traffic flow using a novel triplet system called Conditionally Gaussian Observed Markov Fuzzy Switching Systems (CGOMFSM). Unlike its hard counterpart, CGOMFSM allows for a transient and gradual representation of traffic state transition and hence improves the estimation performance using a tractable scheme. The potential of the proposed model is illustrated through an application to the problem of traffic incident detection, particularly sporadic traffic congestion caused by unexpected road conditions. The performance of the proposed model is assessed using real traffic datasets from England highways. A simulation of traffic in the city of Salalah in Oman was conducted to evaluate the efficacy of the CGOMFSM-based traffic estimation and incident detection schemes with different penetration rates.


I. INTRODUCTION
Traffic state estimation is of a paramount importance for the implementation of Intelligent Transportation Systems (ITSs) in smart cities of the future [1]. Traffic state estimation has been traditionally achieved using fixed sensors, such as inductive loops, radars and cameras, which allow for accurate observation and estimation of speed, flow and occupancy. The main disadvantage of these sensors is their high installation and maintenance costs, which doesn't allow them to be deployed at a large scale. They are instead deployed on major and selected road segments only, which limits their spatial coverage for traffic state estimation. Recent advances in Information and Communication Technologies, Ubiquitous Computing, Vehicular Networks, and Connected Vehicles have opened up new opportunities for smart cities to develop advanced ITSs that combine traditional fixed sensors with new ubiquitous sensing devices for city-wide traffic state The associate editor coordinating the review of this manuscript and approving it for publication was Michail Makridis . estimation, such as Cellular phones [2], GPS-equipped cars and devices [3], crowd-sourcing [4], VANETs [5], Connected Vehicles [6]- [9], and recently Vehicle-Infrastructure Integration (VII) technologies [10], [11]. While all these new sensing technologies address the coverage limitations of fixed sensors and allow for city-wide traffic state estimation, some of them present the other interesting advantage of providing infrastructure-less and cost-efficient solutions for traffic state estimation, making them particularly interesting for cities in developing countries that do not have road and sensing infrastructure of traffic monitoring. Probe Vehicles (PVs) [12]- [14] and Cell Phones [15] have been widely used for the implementation of infrastructure-less and cost-effective traffic state estimation solutions, mainly for travel time and signalized intersections' queue length estimation [14], [16]. But beside the penetration rate issue, these technologies cannot be used to directly estimate traffic flow. To address this problem, some research works tried to extend PVs with other data sources in order to collect traffic flow data, such as equipping PVs with spacing measurement equipment to count VOLUME 9, 2021 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ surrounding vehicles [17]- [20] and combining PV-data with traditional fixed sensory data (inductive loops, etc.) [21]- [23]. Nevertheless, traffic flow estimation based on mobile probe data only remains a challenging research problem that has been explored to a relatively limited extend in the state of the art. Recent works have used the fundamental diagram (FD) to estimate traffic flow or volume from PVs trajectory data [24]- [26], but these FD-based approaches need largedata calibration effort for every individual road. Other works have used Kalman filter [27], [28], Naive Bayes and shockwave models [29], [30] to estimate traffic flow from PVs but are generally applicable only at signalized link approaches and intersections.
In this paper, we propose a new data-driven traffic state estimation approach that relies on infrastructure-less smart city sensing technologies and can therefore be used to support the implementation of traffic management services of smart cities in developing countries. We propose a model that can be used for traffic flow estimation on motorway segments based only on aggregated speed data collected from IoT-probe sources moving on these road segments. IoT-probe sources can be any individual connected devices such as private cars and drivers' smartphones, and not only Floating Cars or dedicated PVs. The model can be used to implement completely cost-less traffic control systems without installing or maintaining any road infrastructure. The increasing amount of vehicles and connected mobile devices in developing countries allows for maintaining a minimum penetration rate required for accurate traffic flow estimations. In this paper we only present the new proposed data-driven model of traffic state estimation [31] and we assume that the aggregated speed data are given as input. Issues related to the development of sensing architectures relying on aggregated GPS-Data sources and their use for the implementation of practical traffic state estimation solutions in the context of smart cities are out the scope of this paper and have been addressed in many previous research works, such as [32]- [34] and [3]. More recent insight about the use of GPS data for city-wide traffic state estimation and monitoring in general can be found in [35], [36].
The contributions of this work are as follows: • A new traffic state estimation algorithm based on the Conditionally Gaussian Observed Markov Fuzzy Switching model (CGOMFSM) and which relies on an explicit representation of the dependence between the traffic flow, speed and state, represented as stochastic processes. The proposed algorithm is used to estimate traffic flow based on average speed data only.
• A new parameter estimation algorithm based on the CGOMFSM model.
• An application of the proposed algorithm to the problem of abnormal traffic event (congestion) detection.
The remainder of this paper is organized as follows: Section II presents an overview of the techniques and approaches used for infrastructure-less traffic flow estima-tion in the state of the art, and depicts the proposed estimation framework. Section III details the triplet model used to represent the traffic state evolution over time along with the underlying traffic state estimation algorithm. Section IV presents the derivation of the fuzzy parametrization of the triplet model. Section V details the parameter estimation scheme adapted for the proposed fuzzy parametrization and presents an application of the scheme to the problem of sporadic traffic incident detection. Section VI introduces an extension of the proposed estimation algorithm for a one-step ahead prediction of traffic state. In Section VII, we report the results of the experimental study using both real datasets and simulated traffic data for traffic event detection applications.

II. STATE-OF-THE-ART AND PROPOSED FRAMEWORK A. REVIEW OF UBIQUITOUS AND COST-EFFECTIVE TRAFFIC FLOW ESTIMATION
This section reviews the main state-of-the-art works related to the problem of ubiquitous and infrastructure-less traffic flow estimation, in terms of both sensing technologies and estimation models. Traffic flow estimation works using VANETs, Connected Vehicles and Vehicle-Infrastructure Integration are not considered given that these technologies require certain infrastructure to be deployed either on vehicles or on road sides or on both. An exhaustive review of traffic state estimation technologies and models can be found in [31].
GPS-data have been widely used for traffic state estimation purposes, collected either by PVs [3], [36] or smartphones [32], [34]. PVs are recognised as cost-efficient sources of traffic data, especially for traffic speed and travel time estimation, but their use for traffic flow or traffic density estimation is challenging. Some researchers equipped PVs with range measurement sensors, such as cameras, to have a local flow estimation of surrounding vehicles [18], [37], while other have combined PV data with data collected on the ground, such as loop detectors [22], [23]. There is a relatively limited number of works on estimating traffic flow or traffic density from PV data only. The Fundamental Diagram (FD) has been used to estimate traffic flow or volume from PV trajectory data [24]- [26] on highways and major arteries. The major drawback of the FD method is that it requires calibration for each individual road with a sufficiently large amount of traffic data [21]. Aside the well-studied issue of penetration rate [38], [39], the main problem of traditional PV-based methods concerns the nature of fleets -mostly commercial such as taxis and buses-which biases the estimation given that these types of fleets have their own specific driving and spatio-temporal mobility patterns that are different from private vehicles.
Cellular systems [40] have also been used to provide alternative methods to the cost and coverage limitations associated with infrastructure-based solutions, and some works have been proposed to estimate volumes of vehicles from anonymous cellular phone call data [41], [42]. These systems cannot generally provide fine-grained traffic volume estimates and require long processing time which make them unsuitable for real-time estimation [15], [43].
Smartphone applications have also been widely used for traffic monitoring in general, including traffic state and congestion estimation, commonly deployed as a crowd-sourcing approach [4] where traffic data are collected from mobile applications that run on road users' mobile phones. By combining the computing capabilities and variety of sensors (GPS, accelerometers, microphones, cameras, etc.) of the smartphones, different types of traffic data can be collected and processed on the phone side before being sent to the server side for further processing. Several research and commercial projects have been proposed (a good review can be found in [44]). Other works have also explored the use of GPS data streams [33], [34], [45], [46]. Smartphone applications and GSP data streams have been mainly used for average speed and travel time estimation, but, to the best of our knowledge, not for traffic flow estimation.
With respect to estimation approaches, most of the existing works collect traffic volume data from fixed-location sensors. Only few works have addressed the problem of estimating traffic volume from mobile data sources. Different techniques have been used, such as statistical models [29], [30], [47], combination of Shockwave theory and Maximum likelihood estimation (MLE) [26], [48]- [50], Kalman filter [27], [28], [51], compressive sensing techniques [52], spatio-temporal correlation data-driven models [23], [47], [53], in addition to Flow Diagram-based models where travel speeds from probe vehicle data for each road are estimated and then converted to traffic volumes by exploiting the relationship between travel speeds and traffic volumes [54], [55]. A more complete review can be found in [31]. To the best of our knowledge, none of the previous research works has explored the use of Conditionally Gaussian Observed Markov Switching Model for traffic state estimation in general, and particularly for traffic flow estimation from aggregated mobile speed measures.

B. PROPOSED FRAMEWORK
There exist many smart devices that allow for a quite accurate and timely measure of vehicles' speed. Smart speedometers are available at reasonable prices and operate on almost any type of vehicle. There is also a myriad of mobile applications that can perform as good as smart tachometers. In addition to the radar guns and speed traps used by law enforcement officers and mobile patrols, the number of smart vehicles with embedded smart speed measuring devices has been substantially increasing over the last few years.
Our framework exploits the speed data collected from sensing devices that correspond to a specific road segment within a certain time interval I n =]t n−1 , t n ]. We assume that traffic state is estimated in regular time intervals I n =]t n−1 , t n ], 0 < n. The collected speed data are used to estimate the average velocity denoted hereafter as v s n , where s refers to the road segment and n to the period I n . For GPS and speed data, we consider the following model: where t n refers to the timestamp, L n to the longitude, l n to the latitude, ω n to the speed and d n to the direction of the vehicle movement during I n . The GPS coordinates of the vehicle are mapped to the road segment using the map matching procedure described in [56]. It is worth mentioning that the issues related to network connectivity such as latency and those related to power consumption are not considered in the scope of this research. We mainly focus on the data-driven model and its ability to infer complete traffic state from a single traffic variable.
Once the location of a vehicle is matched to the road segment of interest, the vehicle speed is recorded throughout the journey. As soon as the period I n is elapsed, the vehicle speed data are averaged to v n . Whenever possible, traffic flow data during I n , denoted as f n , are recorded as a training set for the estimation module and used afterwards to infer traffic state using speed data only, as will be presented later. For the sake of simplification, we assume that connectivity related issues incurred by transmission lags are not considered. Hence, we assume that the time intervals I n are sufficiently large so that communication delays do not impact the accuracy of the model. We distinguish between two phases ( Figure 1  and traffic flow and fluency from the observed speed v i only. We propose to model the depicted framework by three stochastic processes F, V and S, where F represents the flow variation, V the speed variation and S the traffic fluency. The difference between two consecutive switches S n and S n+1 depicts the change of traffic condition. Let us note that the training phase can be conducted using a samplebased approach. Since we focus on motorways, we can select road segment samples based on common characteristics (basic section, on-ramp/off-ramp, speed limit, etc.). Ideally, the trained model could be directly transposed to road sections that share the same features.
The underlying dependency between traffic speed and flow, as well as the transient transition between traffic conditions over time, suggest the use of a model that (i) supports discrete-time discontinuities in a continuous fashion and (ii) represents the dependencies (supposed here linear) between the stochastic processes F N 1 and V N 1 . To achieve a time-dependent representation of traffic state, we introduce a latent process denoted as S N 1 and referring to the traffic switch over time. Hence, the random variable s n represents the traffic state during the time interval I n during which the speed measure v n has been recorded.
The problem can be formulated as follows: given P measures of average speed v P 1 and traffic flow f P 1 at a given road section spanning a time interval T = P n=0 I n , where I n are time periods of equal length, the goal is to determine a model that can be used to (i) explicitly represent the dependence between the stochastic processes F and V using an auxiliary process S and (ii) construct a tractable procedure to estimate the traffic state from the speed data V observed during a monitoring time interval

III. TRAFFIC STATE MODELING AND ESTIMATION USING CGOMFSM
First, let us briefly present the model we will deal with. The CGOMFSM is an extension of the classical Conditionally Gaussian Markov Switching Model (CGMSM) defined as follows. Let us consider three discrete-time stochastic processes F N 1 , V N 1 , and S N 1 , where, for each n ∈ [1, N ], F n is a (hidden) real-valued random variable, V n is an (observed) real-valued random one and S n is a (hidden) discrete-valued random variable with two states: S n ∈ = {0, 1}.
Definition 1: Let us set Z n = (F n , V n ) , T n = (F n , S n , V n ) and assume the following: where Recursive filtering is not tractable in the general CGMSM setting, but becomes tractable under the condition that, for each s n+1 This particular CGMSM, called ''Conditionally Gaussian Observed Markov Switching Model'' (CGOMSM) [57] allows for recursive optimal filtering even with a switching setup [58]. A fuzzy extension of the CGOMSM has been proposed and studied in [59] in which the switch process S N 1 is no longer assumed discrete but takes its values in the interval [0, 1] instead. Hence, the distribution of each S n is defined by density h n : [0, 1] −→ R w.r.t. the measure ν = δ 0 + δ 1 + µ ]0,1[ , with δ 0 , δ 1 two Dirac's distributions on 0, 1 respectively, and µ ]0,1[ a Lebesgue's measure on ]0, 1[. Thus, we have: and = p(s n = 0)p(s n+1 |s n = 0) + p(s n = 1)p(s n+1 |s n = 1) The distribution of the fuzzy Markov chain S N 1 = (S 1 , . . . , S N ) is defined by density p (s 1 ) and the conditional densities p (s n+1 |s n ). All of them are densities on Figure 2 shows the dependence graph between the model processes.

A. TRAFFIC STATE MODELING USING CGOMFSM
We assume that the average speed data are measured and recorded on a regular basis, while the total carriageway flow data are not necessarily observable. The main rationale for considering such assumption is that it is easy to aggregate traffic speed from sampled measures while it is more difficult to determine the traffic volume unless necessary nay intrusive equipment has been installed.
The proposed traffic modeling framework is composed of two main stages: • Stage 1: Traffic model parameter estimation. At this level, historic traffic data are utilized to infer the CGOMFSM model parameters and fit the fuzzy parametrization to a particular dataset. Let us denote by {1, . . . , N } the measurement time points for which both traffic volume and average speed data are available. Z N 1 = (F N 1 , V N 1 ) data are used to estimate the CGOMFSM parameters and the associated switch process S N 1 . • Stage 2: Traffic state estimation. For any time point n > N , only speed data are recorded from a variable number of road users. The aggregated speed data V n are hence used to extrapolate the traffic volume F n |V n 1 , the current switch S n |V n 1 using a recursive filtering scheme and to predict the next measurement V n+1 |V n 1 , S n 1 . Within the CGOMFSM framework, let us denote by Z n ,Z n+1 (s n+1 n ) the covariance matrix of the vector where (s n+1 1 −E s n | v n 1 | the random variable corresponding to the switch variation during the time interval I n . The higher n+1 n , the more likely an incident could have occurred. Normally, traffic conditions transition gradually over time unless a sporadic or routine incident has occurred and spurred a relatively abrupt switch. Using the assumption of CGOMFSM, the computation of the filter becomes tractable using the following equations: with and where • d(ν(s n+1 n )) = d(ν(s n ) ⊗ ν(s n+1 )); • p (s n+1 |s n ) is given by the fuzzy parametrization (see section IV); • according to the Gaussian assumptions and (1), (3), . Hence, the next switch is estimated using the equation: Finally, the traffic state estimation algorithm runs as follows: • Compute p s n+1 v n+1 1 using equations (7) and (8).
• Compute the estimated traffic state at time point n + 1 using: • Estimate the traffic flow F n+1 using: • Finally, the switch gradient is calculated using p s n v n 1 and p s n+1 v n+1 1 using the following equation: Remark: Integrating with respect to ν is not always possible in a closed form. Hence, we use the following approximation: where L is the number of fuzzy levels. Since the set of integration here is compact, the approximation (11) remains tractable.

IV. PARAMETRIZATION OF THE FUZZY MARKOV CHAIN
In this section, we outline a possible parametrization of the density p s n+1 n of the Markov chain S N 1 suited to our target application. As we assume the model to be stationary, p s n+1 n is time-independent and hence p s n+1 n ∼ p s 2 1 . The density of P (S 2 1 ) w.r.t. ν ⊗ ν -where ν = δ 0 + δ 1 + µ ]0,1[is assumed of the following shape: with m > 0, and 1 0 p(s 1 , s 2 ) dν(s 1 ) dν(s 2 ) = 1. This parametrization is defined by 5 parameters (m, α 0 , α 1 , β and η). It is an extension of the parametrization studied in [59] and used for the estimation of buildings power consumption from outdoor temperatures.
Let us interpret this parametrization with respect to the targeted application. The probabilities that the traffic remains in the same boundary state are denoted by α 0 for very low traffic volume and by α 1 for traffic peaks. Another quite realistic assumption regarding the traffic dynamics behaviour with respect to the boundary conditions is the following: a direct transition from unoccupied road (state 0) to extremely congested traffic (state 1), and reversely, is fairly improbable when the number of discrete fuzzy levels L is sufficiently high (typically when L > 5). Therefore, we will set β = 0 in the experimental part, yet this parameter will not be discarded for the sake of generality and for possible other application usage.
Parameter m allows to introduce very different behaviors for the Markov chain that governs the state of traffic. Figure 3 plots three possible shapes of the joint density p s 2 1 for different values of m. When m = 1, the joint density is a piecewise linear shape and is proportional to the difference between s 1 and s 2 . When m is large, p s 2 1 is almost constant within the diagonal. Setting m below 1 imposes a low probability on distant s 2 1 , and the closer s 1 to s 2 , the higher p s 2 1 . By marginalizing (12), see calculations reported in Appendix, the density p (s 1 ) is given by Still from the appendix, we have Hence the joint a priori fuzzy density is only parametrized by four parameters m, α 0 , α 1 and β. The density p (s 2 |s 1 ) is the ratio between the joint density (12) and the marginal density (13). We have to distinguish between different cases, according to the value of s 1 : where we set , and

V. PARAMETER ESTIMATION AND EVENT DETECTION
In this section we first explain how the parameters of the model can be estimated from training samples and then we present how the parameter estimation algorithm can be used to detect sporadic events from traffic data. The proposed parameter estimation algorithm is devised based on an iterative scheme in which each step q consists in estimating both the joint a priori density and the means and covariances of the CGOMFSM.

A. PARAMETERS OF THE JOINT A PRIORI DENSITY
As specified in eq. (12), the density p (s n , s n+1 ) is defined by four parameters β, α 0 , α 1 and m, recalling that the parameter η is deduced from them using (14). We consider a fuzzy parametrization preset to a fixed m. To fit the fuzzy parametrization to the data of interest, we use the following sampling-based procedure: 1) Simulate Q realizations of the process S N 1 conditionally to Z N 1 using the a posteriori distribution, i.e. using p s n+1 s n , z N 1 . We denote the q th realization by s N and β q+1 . Deduce η q+1 using (14). Figure 4 illustrates the trajectories of Markov chains (S) simulated using the a priori distribution p (s 2 |s 1 ) and the a posteriori distribution p s n+1 , s n z N 1 .

B. ESTIMATION OF THE MEANS AND COVARIANCES
The model parameters, i.e. means vectors and covariance matrices are estimated using a fuzzy C-means procedure. Let The state space is divided into two sub-spaces: stable traffic and unstable traffic. Hence, we make an initial estimation µ k such that ∀k ∈ {0, . . . ,    and initialize the VOLUME 9, 2021 3) Update the membership matrix κ q+1 using: First, we calculate the a posteriori probabilities ψ i (s 2 1 ) = p s n , s n+1 z n 1 from the membership matrix κ using ψ i (s 2 1 ) = κ i,s 1 κ i+1,s 2 For each iteration q > 1: • Compute the model means and covariances using: and The matrices A n+1 (s n+1 n ) and B n+1 (s n+1 n ) are calculated using the following formulas:

D. EVENT AND ANOMALY DETECTION
Under normal conditions, traffic tends to seamlessly transition from one state to another. The proposed a priori densities for the fuzzy joint s 2 1 are designed in a such a way that salient transitions are prevented. However, when a sporadic traffic incident occurs it causes the model to pass from a given switch to a relatively distant one. The severity of the incident can be intuitively estimated based on the gap between two consecutive switches: the higher the gap |s n+1 − s n |, the more likely an event has occurred at time n + 1.
Routine events that occur on a regular basis correspond to traffic jam at peak hours. These events are detected when the estimated switch exceeds a specific threshold τ . The threshold τ is estimated from the jam density and speed and typically corresponds to 0.8. From a conceptual standpoint, we should distinguish between two different types of incidents: routine and sporadic. Using the proposed model, routine events can be detected nay predicted when the estimated switch at time point n + 1 verifies the following conditions: (i) s n+1 > τ and (ii) |s n+1 − s n | < , where is a preset threshold that depends on the number of discrete fuzzy levels. For example, if there exist L fuzzy levels, and if we wish that the model detects any consecutive switches such that the associated gap exceeds two levels, should be set to 2 L . Sporadic events are in nature more complex to model and foresee. Nonetheless, they can be estimated based on fair hypotheses that correspond to real-world situations: • Case 1: The traffic is heavy (which implies that the last estimated switch s n > τ ). The next switch s n+1 is lower yet still greater than τ . Such case corresponds to an event related to one or multiple lanes that have been closed which resulted from a car collision or a broken down vehicle.
• Case 2: The traffic is in a road capacity flow state which typically corresponds to switches lower than τ and higher than τ 2 . The estimated switch s n+1 is such that n+1 . This may indicate an unexpected road condition such as inclement weather, reduced visibility or a sudden road closure. Figure 5 illustrates some examples of the traffic anomalies in which we distinguish between the two cases of traffic incidents by the shape of the plot of switches.

VI. ONE-STEP AHEAD TRAFFIC STATE PREDICTION
To leverage the proposed CGOMFSM, we proposed a tractable scheme for the prediction of traffic data (speed and flow) at time index n + 1 from the historic speed data only. In the stochastic framework considered here, it consists in estimating E Z n+1 | v n 1 , which corresponds to the expected values of the traffic flow V n+1 and average speed F n+1 at time index n + 1 from past observations v n 1 . Formally, this is equivalent to computing the following denoting by dν s n+1 n the product measure dν(s n ) ⊗ dν(s n+1 ).
• The term E Z n+1 | s n+1 n , v n 1 can be calculated using (1) by • The second term can be rewritten p s n+1 n v n 1 = p (s n+1 |s n ) p s n v n 1 , where p s n v n 1 are the filtered posterior probabilities of jumps, whose recursive calculation is detailed in (7) and (8).
Hence it is possible to predict both traffic flow E F n+1 | v n 1 and average speed E V n+1 | v n 1 in a recursive way by taking margins of (24).

VII. EXPERIMENTAL RESULTS
The objective of this experimental study is twofold. First, we evaluate the efficacy of the proposed traffic flow esti-mation model using real ground truth traffic data. England highways datasets have been used for this purpose. Second, we evaluate the effectiveness of the model under different penetration rates. A simulation-based study of the city of Salalah in the Sultanate of Oman has been used for this purpose.

A. EXPERIMENTS WITH REAL DATASETS
In this part of the experimental study, we focus on the validation of the proposed model using ground truth datasets. The goal is to measure the extent to which the fuzzy approximation of non-linear systems pertains to highways traffic data. For this purpose, we selected the England highways datasets publicly available through the Motorway Incident Detection and Automatic Signalling (MIDAS) system. 1 Each MIDAS site reports, on a regular basis of 15 minutes, the traffic volume that corresponds to different categories of vehicles depending on their lengths, as well as the average speed. For the sake of simplicity, we assume the following: • The traffic is homogeneous, i.e. we consider only one category of vehicles corresponding to 0-520 cm.
• The traffic is stationary, i.e. we discard traffic states that correspond to conspicuous outliers. To evaluate the accuracy of the flow estimation, we considered the two error measures MAPE and RMSE defined as follows: where x i represents the observed flow (x i = f i ) or the recorded speed (x i = v i ),x i represents the estimated flow (x i = f i ) or the predicted speed (x i =v i ) at time point i, and T is the observation interval. Figure 6 shows two monitoring points situated at A1 highway corresponding to the sites A1/9547A and A1/9542A. We studied the impact of the number of fuzzy levels on the estimation accuracy. We varied the number of fuzzy levels L between 0 ( hard model) and 50 for different values of the parameter m. The results of these experiments were averaged and compared to LSTM, ARIMA with order (1, 0, 1) and seasonal order (1, 1, 0, 96) and random walk (RW) for traffic speed prediction. The results of this experiments are reported in Table 1. Similarly, we considered different values of fuzzy levels paired with values of the parameter m and we assessed the accuracy of traffic flow estimation. Table 2 reports the results of this experiment.
Unsurprisingly, the estimation is more accurate for larger fuzzy levels. However, the MAPE and the RMSE rates tend to stabilise for a number of fuzzy levels (L > 20). Since a higher number of fuzzy levels incurs higher computation costs, we can consider that 20 fuzzy levels is a fair tradeoff between estimation accuracy and computational costs.
The parameter m was varied between 0.1 and 10 with a step of 0.1 for m < 1 and a step of 1 for m > 1. We have noticed that the parameter m does not significantly impact the overall accuracy of flow estimation. The obtained results  Impact of the penetration rate on the event detection accuracy (Detection lag: time elapsed between the event occurrence and its eventual detection, false alarm: percentage of wrongly detected events out of the total of reported events and detection ratio: the percentage of correctly detected events).
have not shown a direct correlation between the estimation and prediction accuracy and the parameter m. However the following conclusions were drawn: • For a large number of fuzzy levels (L > 20), m > 1 provided the best estimation accuracy results.
• For a large number of fuzzy levels (L > 20), m = 1 provided the best prediction accuracy results.
• Using the hard model (L = 0), the best estimation and prediction results were obtained using m < 1.
• The worst prediction results were obtained with m > 1 regardless of the number of fuzzy levels.
• In figure 7, the estimation and the prediction errors are relatively higher than those observed in figure 8. This phenomenon can be explained by the sensitivity of the model to high variances especially when the number of fuzzy levels is significantly high.

B. SIMULATION-BASED EXPERIMENTS OF THE IMPACT OF THE PENETRATION RATE
Traffic in the sultanate of Oman has been a significant concern due to the significant increase in the number of personal vehicles caused by the relatively limited public transportation service and to the surge of the number of trucks and utility vehicles due to the rapid growth of logistic-related activities across the country. The city of Salalah, the main city in Dhofar governorate, has been witnessing a substantial increase in the number of vehicles over the last years, ascribable to the diversification of the economical activities and the improvement of the level of living. Furthermore, during the tourist season of Monsoon, the city of Salalah witnesses a high influx of visitors from the other governorates and the bordering countries. These visitors travel using their private cars and significantly increase the traffic during the season. Figure 9 shows the road network of Salalah and highlights the stretch of the studied highway. We used SUMO (Simulation of Urban MObility) to simulate traffic in the studied highways.
The objective of this study is to answer the following questions: • What is the optimal penetration rate that ensures a nearexact estimation of traffic data? The penetration rate is defined as the percentage of sensing devices reporting their speed. To answer this question, we varied the penetration rate ρ from 1% to 10% by a step of 1%. The results of this experiment are reported in table 3.
• Since average speed is collected on a regular basis every 15 minutes, if an incident occurs during the time interval [t n , t n+1 ], how long would it take to detect the incident? If an event occurs during an interval I n how many subsequent intervals will elapse before the event has been detected? The detection time lag results are reported in Table 4. Furthermore, we report the missed event and the false alarm ratios. To answer this question, we randomly simulated 1000 traffic incidents by blocking one lane at different locations of the highway at distinct times of the day. Remarks: • The penetration rate was simulated by a random sampling of vehicles crossing the segment of interest. For example, to obtain a penetration rate of 10%, the sampling scheme consists in reporting the data from one vehicle out of 10.
• The number of fuzzy levels considered in these series of experiments is set to 20. The rationale for this choice is that this number of fuzzy levels yielded accuracy results close to F = 50 within significantly less time.
• For this experiment, we set m = 0.5.

VIII. CONCLUSION
In this paper, we proposed a cost-effective framework for integrated estimation and prediction of traffic state. The proposed framework relies on a triplet model explicitly fuzzy representing the variation of traffic over time called CGOMFSM. Such modeling allows to rapidly detect any anomalies that result from an abrupt state change which likely refers to a road incident. The experimental study was conducted at two levels. First, we evaluated the efficacy of the CGOMFSM in terms of traffic flow estimation from aggregated average speed as well as its ability to predict the traffic state over a short time window. Then, we conducted a simulation to assess the impact of the penetration rate (referring to the ratio of vehicles reporting their speed) on the estimation accuracy. The main results suggest that up to a sufficient penetration rate (around 10%), the proposed conditionally Gaussian observed Markov fuzzy switching model yields satisfactory results in terms of parameter estimation, speed prediction and traffic flow estimation. As the experimental results are very promising, the next research stage will focus on evaluating the performance of the proposed approach in real-world environments in collaboration with competent authorities. To infer a relationship between the parameters α 0 , β, α 1 and η, we use the normalization condition: Therefore, after a few calculations not reported here, we get α 0 + α 1 + 2β + η m(5m + 11) (m + 1)(m + 2) = 1, so that η = (m + 1)(m + 2) m(5m + 11) and the model is only parametrized by 4 parameters: m, α 0 , α 1 and β.