Development of the Västra Götaland Operating Cycle for Long-Haul Heavy-Duty Vehicles

In this paper, a complete operating cycle (OC) description is developed for heavy-duty vehicles traveling long distances in the region of Västra Götaland, Sweden. Variation amongst road transport missions is accounted for using a collection of stochastic models. These are parametrized from log data for all the influential road parameters that may affect the energy performance of heavy trucks, including topography, curvature, speed limits, and stop signs. The statistical properties of the developed OC description are investigated numerically by considering some composite variables, condensing the salient information about the road characteristics, and inspired by two existing classification systems. Two examples are adduced to illustrate the potential of the OC format, which enables ease of classification and detailed simulation of energy efficiency for individual vehicles, with application in vehicle design optimization and selection, production planning, and predictive maintenance. In particular, for the track used in the first example, a Volvo FH13 equipped with a diesel engine, simulation results indicate mean CO2 emissions of around $1700\,\,\text {g}\, \text {km}^{-1}$ , with a standard deviation of $360\,\,\text {g}\, \text {km}^{-1}$ ; in the second example, dealing with electrical fleet sizing, the optimal proportion shows a predominance of tractor-semitrailer vehicles (70%) equipping 4 motors and 11 battery packs.

INDEX TERMS Operating cycle, mission classification, road transport mission, stochastic modeling, autoregressive models, Markov models.

Symbol Description C
Random variable for curvature (m −1 ). G R Generator matrix for road type. G V |r k Generator matrix for speed sign. L s Sampling length for topography (m). L tot Total road length (km). L R Mean length vector for road type (km). L R Stochastic mean length vector for road type (km).

L Ri
Mean length for road type (km).

L Ri
Stochastic mean length for road type (km). L V |r k Conditional mean length vector for speed sign (km). L V |r k Conditional stochastic mean length vector for speed sign (km). L Vi|r k Conditional mean length for speed sign (km). L Vi|r k Conditional stochastic mean length for speed sign (km). M C|r i Conditional stochastic mean log-radius (ln m). M L|r i Conditional stochastic mean log-length (ln m). Random variable for speed sign (km h −1 ). X GTA Multivariate random vector for GTA (km), (km −1 ). X UFD Multivariate random vector for UFD (km), (km −1 ). X R s Multivariate random vector of sOC parameters (ln m), (ln km). Y Random variable for road grade (%).

R
Stochastic stationary probability vector for road type.

Ri
Stochastic stationary probability for road type.
V |r k Conditional stochastic stationary probability vector for speed signs.

Vi|r k
Conditional stochastic stationary probability for speed signs.

C|r i
Conditional stochastic std log-radius (ln m).

L|r i
Conditional stochastic std log-length (ln m).

Y |r i
Conditional stochastic standard deviation of topography (%). e Y Error term for topography (%). g Rij Entry of the generator matrix for road type. g Vij|r k Entry of the conditional generator matrix for speed signs. n ′

C|r i
Conditional expected number of curves (km −1 ). Vector-valued mean of random sOC parameters (ln m), (ln km). π R Stationary probability vector for road type. π Ri Stationary probability for road type. π V |r k Conditional stationary probability vector for speed signs. π Vi|r k Conditional stationary probability for speed signs. φ Y |r i Conditional autoregressive coefficient for topography. σ e Y |r i Conditional standard deviation for the innovation in the topography (%). σ C|r i Conditional standard deviation of log-radius (ln m). σ L|r i Conditional standard deviation of log-length (ln m). σ Y |r i Conditional standard deviation of topography (%). Dir(·) Dirichlet distribution. E(·) Exponential distribution. Ga(·, ·) Gamma distribution. N (·, ·) Normal distribution.

I. INTRODUCTION
The optimal design of both commercial and heavy-duty vehicles is traditionally supported by the extensive use of simulations in dedicated virtual environments, which allow for a preliminary evaluation of energy performance. However, at early stages, prototypes are typically tested considering a VOLUME 11, 2023 reduced number of transport missions, which should ideally be representative of the overall intended usage. Instead, reproducing the variation amongst road operations in an accurate way is essential to correctly estimate the energy performance in real-world scenarios, depending on the characteristics of the transportation task [1], [2], [3], [4]. Indeed, the inherent variation amongst road transport missions may excite different responses in ground vehicles, implying the need for a proper understanding of where and how they are operated, especially in view of the impellent shift of paradigm towards full automation and electrification [5], [6]. This also implies the need of evaluating in isolation the contribution of individual parameters like road properties, weather, and traffic conditions. In this context, the conventional description in terms of a driving cycle is not suitable, since all the relevant information about the operating environment is lost in the process of synthesizing a representative speed profile [7], [8]. On the contrary, the operating cycle (OC) format, recently proposed in [9] and [10], allows to explicitly account for the influence of the surroundings on vehicular performance, allowing for ease of classification of road operations and detailed assessment of energy efficiency. Previous works on the OC representation have however been limited to considering individual transport operations and the variation within. Instead, this paper deals with the development of a complete OC description concerning long-haul heavy-duty vehicles, considering variation amongst transport missions. More specifically, the spread in road conditions -which is reflected in those amongst transport operations and, in turn, energy performance -is captured by modeling the variation in the influential parameters.
To better highlight the salient differences between the two types of representation mentioned above, the next Section I-A recapitulates the work previously done on driving and operating cycles, along with introducing the three fundamental problems identified by Pettersson: the representation, variation, and classification problems [9].

A. PREVIOUS WORKS ON DRIVING AND OPERATING CYCLES
This Section summarizes the previous contributions to the modeling of driving and operating cycles.

1) LITERATURE REVIEW ON THE DRIVING CYCLE REPRESENTATION
In the past years, a great deal of research has produced a number of alternative approaches aimed at synthesizing representative transient driving cycles. In fact, whilst the initially proposed rule-based methods have been rapidly abandoned in favour of statistical techniques [11], [12], these have proliferated, leading to a plethora of variants. Statistical approaches are typically preferred when synthetically generated speed profiles are required to correlate with specific operating conditions. The considered signals may include information concerning only the vehicle's state, e.g., cruising, idling, acceleration or braking [13], [14], [15], [16], [17], [18], [19], [20], [21], [22], [23], or also account for road grade [24], [25].
The scientific literature distinguishes between four different techniques to construct synthetic driving cycles: microtrip-based, segment-based, pattern classification and modal cycle construction [26]. The first method initially generates several candidate cycles by resorting to the so-called micro-trip decomposition technique. Multiple approaches to identifying consecutive micro-trips have been proposed, either involving random processes or based on the definition of specific criteria that relate to the modal characteristics of consecutive excursions. Stochastic optimization algorithms are then employed to optimally select the final driving cycle from the initial population [27], [28]. Transient driving cycles synthesized using the micro-trip decomposition include for example those for the cities of Hong Kong [29], Pune and Chennai, India, [23], [30] and Singapore [31]. Segment-based methods synthesize driving cycles mainly starting from the available speed and acceleration signals. A major advantage of such a technique resides in that, compared to the micro-trip-based one, it allows to incorporate details about the surroundings, including road properties and traffic conditions [4]. However, since the trips are segmented based on different criteria than consecutive stops, additional constraints need to be imposed when transitioning from each segment [32], [33]. Pattern classification methods partition data into kinematic sequences by resorting to statistical techniques [34], [35], [36]. After specifying opportune criteria, the kinematic sequences are subsequently classified into heterogeneous classes. Finally, by combining different kinematic sequences based on the statistical properties of each class, an optimal driving cycle is generated. In this context, two frequent approaches employed to synthesize representative cycles from kinematic sequences are the cluster and principal component analyses [37]. Maximum likelihood estimation techniques are used to cluster the measured speed data into snippets in the modal cycle method [18], [24], [25]. Relying on a Markov property argument, supported also theoretically by the analyses conducted in [38], driving cycles are constructed starting from opportunely selected snippets. Advanced formulations, including multi-dimensional Markov chains that integrate speed signals with information concerning the acceleration and road grade, or variable passenger loads, have been proposed for example in [24], [25], [26]. A summary of these different techniques, with their advantages and disadvantages, is given in Table 1.
The above mentioned approaches are all aimed at synthesizing a unique representative transient cycle starting from a large amount of data. In fact, a single driving cycle is usually sufficient when it comes to apply conventional algorithms and routines for the purposes of vehicle design optimization and selection. In this context, the need for an accurate, individual description is referred in the literature as the representation problem [9]. On the other hand, reproducing variation in transport operations may lead to a more accurate prediction of the energy performance of road vehicles. Ideally, departing from a single representation of the usage, driving cycles may be constructed to be statistically equivalent, so as to produce a meaningful spread in performance. This line of research has been pioneered by the authors of [32] and [33], who have proposed a procedure that allows for the synthesis of multiple driving cycles, departing from a single trip. More specifically, it was shown in [32] and [33] that the energy performance of electric city buses is highly sensitive to fluctuations in both the number of stops and passenger load. Indeed, by treating both quantities as random variables, they deduced that the energy consumption follows approximately a normal distribution. The variability connected to the randomness of the external surroundings, concerning for example temperature and rolling resistance, was also investigated in [39], where the inherent uncertainty was modeled by adding extensive noise to the assumed nominal value for the quantities in interest. The fundamental contribution of [32], [33], and [39] essentially explored the variation problem, as defined in [9]. Concerning specifically the driving cycle description, the technique introduced in [32] and [33] is renamed stochastic generation in Table 1.
Along with the representation and variation problems, however, Pettersson also defines the so-called classification problem [9], in conjunction with the need to properly qualify a road mission (or an entire transport application) using a simplified set of metrics and labels. This is mainly motivated by the fact that, rather than referring to a certain speed profile, vehicle manufacturers and operators usually describe the usage concerning the characteristics of the environment, which are easier to interpret. In this context, whilst the conventional description in terms of a driving cycle has been successful in addressing the first two problems indicated in [9], the latter still deserves particular attention. In fact, a major limitation connected with the driving cycle representation is that the information about the operating environment -including parameters that may dramatically impact energy consumption [1], [2], [3], [4], [40], [41], [42], [43], [44], [45], [46] -is often lost or accounted for only implicitly during the synthesis of a representative speed profile.

2) LITERATURE REVIEW ON THE OPERATING CYCLE REPRESENTATION
The OC representation is a relatively novel description developed in [9], [47], and [48] and further extended in [49] and [50]. As opposed to a driving cycle, in the OC framework, the mission properties are modeled separately from those of the surroundings, without postulating a reference speed profile. The external stimuli originating from the operating environment are then converted dynamically into the desired speed by using a driver model. This expedient allows describing road transport missions in a way that is, to a large extent, independent of both driver and vehicle.
The OC format comprises three levels of representation and describes a physical quantity via deterministic or statistical models. More specifically, the bird's-eye view is conceived as a high-level representation, where road missions are labeled by resorting to simplified indicators. It is intrinsically suited to cope with the classification problem [9]. The stochastic operating cycle (sOC) consists of a more formal description and collects all the random parameters that are needed to reproduce the statistical properties of a transport operation. It is the mathematical tool suggested in [9] to address the variation problem. Finally, the deterministic operating cycle (dOC) may be interpreted as a single realization of an sOC, and includes a detailed representation of the operating environment. Similar to a driving cycle, it has been mainly conceived to deal with the representation problem [9].
Stochastic and deterministic models are currently available for road parameters [47], [48], weather, and traffic conditions [49]. Moreover, the authors of [50] have recently shown how to build a piece of consistent and cohesive machinery that connects all three levels of representation, by establishing a set of formal relationships between the bird's-eye view metrics and the statistical models included in the sOC. In [50], this operation was strongly inspired by two already existing classification systems, namely the Global Transport Application (GTA) [51] and the User Factor Description (UFD) adopted by Volvo and Scania, respectively.

B. CONTRIBUTION OF THIS PAPER
So far, the application of the OC format has been limited to individual missions. As already mentioned, a single operating cycle, opportunely selected to be representative of the overall usage, may be sufficient for the purpose of design optimization, whereas the energy efficiency of road vehicles should be more comprehensively assessed by considering the intended transport application.
Therefore, from a conceptual and methodological perspective, the first contribution of this paper consists of exploring an additional dimension of the OC framework, concerning the distribution of missions -mainly with respect to the road characteristics, e.g., topography, curvature, speed limits, and stop signs 1 -over the entire population defining the transport application. The focus is on heavy-duty trucks traveling distances typically longer than 50 km. The investigation is systematically conducted by exploiting the statistical and hierarchical structure of the OC description. In particular, the sOC parameters (interpreted as realizations for individual road operations) are modeled in this paper as random variables when defined on the entire population of missions. By doing so, it becomes possible to capture variation in usage due to modifications in the statistical properties of the environment, ultimately resulting in the external stimuli affecting 1 As explained in Section I-A2, the current OC description does contain also models for weather and traffic conditions. However, these are not included in the paper. There are multiple reasons for this, the first being that the parameters of the weather and traffic models cannot be directly estimated from log data, and thus cannot provide information about variation within the transport application. If needed for simulation purposes, traffic and weather models may eventually be incorporated starting from previous contributions by the authors [49], [50]. the vehicle's behaviour. The first part of the work is hence dedicated to the stochastic modeling of the sOC parameters. As it is traditionally in the spirit of the OC representation, which fervently endorses the principle of parsimony, these are assumed to be independent of each other and to follow simple distributions that allow for ease of interpretation.
A numerical analysis of the resulting OC description is then conducted. This operation heavily relies on the nonbijective connection between the sOC and the bird's-eye view levels of representation and builds upon some recent results established in [50]. The abstract problem of properly classifying an OC model intimately relates to the aspect of representativeness and, in fact, may specifically be reformulated in question form: ''what type of usage is this OC representative of, and what can be done with such information?''. In this context, the interrogation is everything but trivial. Indeed, as opposed to the conventional driving cycle representation, for an OC, describing a transport mission directly in terms of the operating environment, the notion of representativeness should be referred to the characteristics of the surroundings. To answer this question, the statistical properties of the parametrized OC are investigated concerning some composite variables, which, apart from reducing the relatively large dimension of the original problem, condense the salient features of the individual road missions. In this paper, the definition of the composite variables is inspired by two different classification systems in use by Volvo and Scania, respectively. Moreover, a simple methodology is proposed to synthesize a single set of representative sOC parameters, starting from their original distributions.
The second contribution of this paper is to illustrate some applications of the whole OC machinery to the processes of virtual design, selection and testing, by taking advantage of all three levels of representation comprised in the format. To this end, two different examples are adduced. The first focuses on the single-vehicle perspective and is aimed at assessing energy efficiency in a simulation environment. The second example deals with the optimal selection and production planning, based on the detailed distribution of the road missions, in the footsteps of [52], [53], [54]. The methods of data synthesis used in the paper are also implemented in MATLAB/Simulink ® environment.
The remainder of the paper is organized as follows. Section II provides a general introduction to the OC description, with details about each level of representation and the relationships between them. In Section III, the fundamental theory for stochastic road and mission models is recapitulated and extended to account for variability in the transport application. Moreover, the parametrization of the road models is carried out using log data collected from heavy-duty vehicles during real-world operations. Section IV is devoted to the numerical analysis of the developed OC description, using the composite variables inspired by the GTA and UFD systems adopted by Volvo and Scania. An example of application of the entire OC format for vehicle design, selection, and testing is then adduced in Section V. A comprehensive discussion about the methodology presented in this paper, along with its limitations and possible extensions, is presented in Section VI. Finally, the main conclusions, together with some directions for future research, are summarized in Section VII.

II. BACKGROUND ON THE OPERATING CYCLE
The OC format consists of a mathematical framework that describes road transport missions and applications directly in terms of the operating environment. As already mentioned, it comprises three main levels of representation: the bird'seye view, the sOC, and the dOC, arranged in a hierarchical fashion, as shown in Figure 1. These three different descriptions address the classification, variation and representation problems, respectively. In turn, each description groups the corresponding models or parameters into four common categories: road, weather, traffic and mission [9], [47], [49]. This paper limits itself to consider the road category, whereas weather and traffic have been discussed extensively in previous works [10], [49], [50] A. THE BIRD's-EYE VIEW Positioned at the top of the pyramid in Figure 1, the bird's-eye view may be regarded as a high-level description of a transport mission. It essentially consists of a collection of labels and metrics, which target either individual missions or entire applications, in order to provide an intuitive understanding of how vehicles are operated on the road. This description is discrete in nature, meaning that it assigns a countable number of labels to a given road mission, corresponding to operating classes that cannot overlap. The bird's-eye view is the main tool to address the classification problem [9]. In previous works, and also in this paper, the metrics and labels for the bird's-eye view have been borrowed from existing classification systems (see, in particular, [50]). An example with the topography parameter is adduced below considering the GTA representation adopted by Volvo, which prescribes four different levels [51]: 1) FLAT if slopes with a grade of less than 3% occur during more than 98% of the driving distance. 2) P-FLAT if slopes with a grade of less than 6% occur during more than 98% of the driving distance. 3) HILLY if slopes with a grade of less than 9% occur during more than 98% of the driving distance. 4) V-HILLY if the other criteria are not fulfilled.
In this case, the bird's-eye view labels coincide with the operating classes FLAT, P-FLAT (predominantly flat), HILLY and V-HILLY (very hilly), whilst the metrics are the values imposed on the road grade (3%, 6% and 9%, respectively) and the probability of occurrence, identically equal to 0.98. Similarly, the User Factor Description (UFD) proposed by Scania considers only three levels: 1) FLAT if max 20% of the road section inclines more than 2%. 2) HILLY if between 20-40% of the road section inclines more than 2%. 3) V-HILLY if more than 40% of the road section inclines more than 2%.
Apart from the number of classes, the main difference with the previous example is that the label is determined by varying the probability of occurrence and specifying a constant value for the threshold on the road grade 2%. For completeness, further details about the GTA and UFD classification systems may be found in Appendix VII-B. The bird's-eye-view description plays a fundamental role in the processes of optimal design and selection of road vehicles, depending on the characteristics of the intended transport application, i.e., the usage. This aspect is discussed in higher detail in Section V.

B. THE STOCHASTIC OPERATING CYCLE
Being a description with an intermediate level of detail, the sOC condenses the statistical properties of a road mission and serves the purpose of investigating the variation problem [8], [16], [20]. The sOC makes use of a collection of stochastic models arranged in a hierarchical fashion, which are in turn equipped with their own set of stochastic parameters (mean, variance, et cetera). The modularity of the sOC description is achieved by assuming that its stochastic models are mutually independent. Some sort of realism is however preserved by introducing two sets of models: primary and secondary ones (subordinate). In this way, it becomes possible to build a composite, highly diversified structure, which guarantees a certain degree of interaction between the secondary models. At the same time, any need for considering complicated multivariate distributions is completely eliminated. Specifically, in the sOC description, primary models for the road and weather categories intimately tied in with the notions of road type and season, as explained more extensively in [50] and in Section III-A, limited to the road models. For completeness, Table 2 lists the complete set of stochastic models, plus their relative categories.

C. THE DETERMINISTIC OPERATING CYCLE
The dOC level represents the most adequate tool to model an operating cycle when the need is on representing individual transport missions. It was specifically conceived in [9] as a virtual environment for optimal vehicle design, virtual testing and synthesis of control algorithms, with the objective of addressing the already mentioned representation problem. Its modular structure permits the addition, modification and removal of models and parameters in a very straightforward way. VOLUME 11, 2023 FIGURE 1. Schematic representation of the pyramidal structure of an OC. A certain transport application includes all the missions that are equivalent according to the bird's-eye-view description (GTA system). The variation within a transport application is taken instead into account by the sOC level of representation. Finally, the dOC description reproduces the variability amongst road operations that are statistically similar. In the figure, OC s and OC d denote the sets of sOC and dOC parameters, respectively. The diagram assumes a single road type, and thus a unique value for the topography variance σ Y (the reader is referred to Section III-B1 for clarifications).

TABLE 2.
Stochastic models and deterministic parameters (dOC parameters) for the sOC and dOC representations. The definitions linear and constant refer to the interpolation strategy adopted for each parameter. The mathematical model of Dirac delta is specified when a parameter is interpreted as an event.
In particular, the dOC regards the sOC models as parameters, interpreted as discrete functions of time and position. Some parameters -like those linked to the road or weather categories -depend explicitly on either position or time; others, like the ones marked in the traffic category, on both. Additionally, the information contained in each parameter is encoded using a scalar or a vector-valued signal (see dimensionality in Table 2), and supplemented with a suitable interpolation model.

D. RELATIONSHIPS BETWEEN DESCRIPTIONS
The three levels of description presented above are intimately related, and ordered hierarchically, as already illustrated in Figure 1. The connection existing between the sOC and dOC representations is perhaps the most immediate to comprehend. Considering a fixed set of stochastic parameters, a dOC may be interpreted as an individual realization of its equivalent sOC description. In fact, a dOC may always be synthesized starting from a fully parametrized sOC. In this context, it should be emphasized that multiple dOCs generated starting from the same set of sOC parameters are statistically identical, but might exhibit significant differences in practice. On the contrary, considering a specified set of stochastic models, an sOC parametrized from a given dOC is always unique. From these reflections, it may be concluded that the relationship between the dOC and sOC levels is nonbijective.
Analogous considerations hold when approaching the higher level of the pyramid, concerning the relationship between the sOC and bird's-eye view, which are both statistical descriptions. The main difference resides in the resolution of such representations. In fact, the bird's-eye view generally encompasses an entire transport application, whereas the sOC usually targets road operations. In this context, the formal relationship subsisting between the two levels may be elucidated considering again the topography classes of the GTA system, whose thresholds are translated by the bird's-eye view into a value for the variance of the road grade. Departing from an individual set of sOC parameters, it is clear that such value may always be determined uniquely. By contrast, for a predetermined GTA class, corresponding to a continuous interval of variances, infinitely many sOCs may exist in theory. In this context, it has recently been shown in [50] that analytical relationships may be established between the sOC parameters and the set of bird's-eye view metrics and thresholds, by formulating the connection between the two representations in terms of probability and expectations. The existence of such mathematical expressions allows labeling a given sOC according to the specific classification system. However, the attention has been restricted so far to consider individual road missions, whereas, as already explained, the bird's-eye view is mainly intended for the classification of entire transport applications. Therefore, this paper precisely investigates the distribution of the sOC parameters over the population of missions that define the application. The new dimension added to the problem permits both qualitatively and quantitatively categorizing the usage from the perspective of energy performance. The analysis is limited to the road models and focuses principally on heavy-duty trucks operating on long distances.

III. STOCHASTIC ROAD MODELS
The distribution of the transport missions for long-haul heavy trucks is analyzed departing from the stochastic road models included in the sOC representation [47], [49], [50]. As briefly anticipated in Section II-B, these are divided into primary and secondary ones. The primary model essentially defines the road type, whereas the secondary models describe the spatial evolution of the quantities in interest along the road, and inherit their parameters from the corresponding road type [47]. The structure of the model is exemplified in Figure 2, where only three road types, namely urban, rural, and highway, have been considered. In this paper, the models are parametrized starting from log data collected from 33 heavy-duty vehicles operating in the area of Västra Götaland, Sweden. Hereinafter, the dataset will be consistently referred to as the OCEAN dataset. Details about the OCEAN dataset and the estimation techniques used for the following analyses are omitted from the current discussion and reported for completeness in Appendix VII-A. Instead, Sections III-A and III-B recall the stochastic models for road types, topography, curviness, stop signs, and legal speeds. Furthermore, a novel formulation for the mission length is introduced in Section III-C.
It is worth clarifying that all the road models discussed in this paper have already been validated in previous works [9], [47], [49], [50]. Moreover, whilst the mathematical structure of such models is very general and not limited to a specifical geographical area, concerning more precisely the Västra Götaland region, a comparison between the actual distribution of the road properties and the parametrized descriptions is shown in [50].
In the remainder of the paper, the notation is as follows: for a generic random variable A : A → S A , its realizations are denoted by a, unless specified otherwise. The probability and expectation operators are denoted as P(·) and E(·), respectively; variance and covariance as Var(·) and Cov(·). The set of real numbers is denoted by R; the sets of positive and negative real numbers are denoted by R ≥0 , R ≤0 when including the zero and by R >0 , R <0 when excluding it. The set of positive integer numbers is denoted by N, whereas N 0 denotes the extended set of positive integers including zero, i.e., N 0 = N ∪ {0}. Sequences of random variables are denoted by {A k } k (the subscript k is often dropped when the clarity allows). Finally, the acronyms PMF, PDF and CDF stand for probability mass function, probability density function, and cumulative distribution function, respectively.

A. ROAD TYPES
The road type represents the primary model for the road category. In particular, the sOC description models the sequence of road types along the vehicle's route depending on the value of the legal speed on each road segment. The two notions are intimately related, based on what was suggested previously in [55]. The stochastic model presented in this paper firstly postulates the existence of n r different road types r t ∈ {r 1 , . . . , r n r }. These are uniquely defined starting from a sequence of n r −1 characteristic speeds, ordered in ascending magnitude (a specific number n v|r i of speed signs is associated with each road type). The characteristic speeds mark the transition between two consecutive road types, which are in turn treated as random variable R t , assuming values r t ∈ S R t = {r 1 , . . . , r n r }, as a function of the speed sign V (x) along the road. The resulting description consists of Since the Markov property is assumed to hold [56], [57], the conditional probability of transitioning to a specific road type reads where the generic road type r i is an element of the road type vector R t = [r 1 . . . r n r ] T . Moreover, by modeling the locations X k for the road types as a Poisson process, the complete model is then described by a continuous-time Markov chain, and parametrized by the entries p Rij of the single-step transition matrix P R ∈ R n r ×n r ≥0 and the n r intensities λ Ri , reading being L Ri the mean length of the road type r i , collected in a vector L R = [L R1 . . . L Rn r ] T . It should be noticed that, in the construction above, no self-transitions are allowed, i.e., p Rii = 0, i = 1, . . . , n r , which automatically implies Starting from the stochastic model for road types detailed above, the stationary π R distribution 2 of the overall process may be derived as the solution of the system 3 [50] where the entries g Rij = g Rij (p Rij , L Ri ) of the generator matrix G R may be calculated as Equations (3) and (4) describe the stationary distribution of the road types along a road transport mission, as a function of the observed number of transitions between road types and their mean lengths. The analytical expression for the stationary vector π R = π R (P R , L R ) is reported in Appendix VII-C1 for n r = 3.
For an individual transport mission, the total probability and expectation of a random variable may be calculated by weighted summation over the different road types, respectively, using the total laws for probability, expectation and variance. Indeed, in the sOC representation, road segments belonging to the same road type are described using the same values for the sOC parameters of the secondary models. Denoting with A a generic random variable, the formulae are hence given as follows: 73276 VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply.
where the analytical expressions for the probabilities P(A | R t = r i ) and expectations E(A | R t = r i ) may be derived starting from the secondary models illustrated in Section III-B.
On the other hand, the sOC parameters for the road type may also be interpreted as realizations of random variables describing their distributions amongst transport applications. Starting with the probabilities p Rij of the single-step transition matrix, it should be observed that they need to satisfy j̸ =i p Rij = 1, i = 1, . . . , n r , since, by model construction, there are no self-transitions. Therefore, denoting withP Ri the vector of random variables whose realization P Ri collects the probabilities p Rij for each row (with i ̸ = j), a Dirichlet distribution is proposed in this paper, i.e.,P Ri ∼ Dir(α Ri ). The stochastic row vectorsP Ri may be concisely organized into a stochastic matrixP R .
For a given road mission, the mean lengths L Ri , i = 1, . . . , n r , may be analogously interpreted as realizations of random variablesL Ri defined over the space of the entire transport application. In this paper, the stochastic lengthsL Ri are assumed to be independently distributed. More specifically, they are modeled as lognormal distributions, that is, ). The stochastic vector of mean hill For what follows, it is crucial to understand that treating the matrixP R and the vector of mean hill lengthsL R as stochastic variables implies that the stationary distribution of the road type becomes a function of random variables, and thus, a random variable itself. Hereinafter, the stochastic counterpart of the stationary vector π R is denoted by R = R (P R ,L R ) when considering a population of random missions. In turn, it may be easily realized that the probabilities and expectations calculated according to (5) may in turn be interpreted as random variables, depending on the distribution of the primary and secondary road models. Figure 3 illustrates the distribution of the mean lengthsL Ri , i = 1, 2, 3, for the three road types considered in the paper, namely urban, rural, and highway, along with lognormal model fitted from the OCEAN dataset.

B. SECONDARY ROAD MODELS
This Section illustrates the secondary road models, which include road topography, curviness, stop signs, and speed signs.

1) ROAD TOPOGRAPHY
In the sOC description, the road topography Y k is assumed to behave as a stationary, first-order autoregressive AR(1) model [40], [41]: where φ Y |r i ∈ (−1, 1) and σ e Y |r i ∈ (0, ∞) are the two characteristic parameters that depend on the given road type. According to (6), on each road type, the road grade itself follows a normal distribution with zero mean [58], i.e., Note that in the above (7) the subscript k has been dropped for the sake of notation. Departing from (6) and (7), the conditional variance of the process may also be deduced as where the autoregressive coefficient φ Y |r i may be also reinterpreted as a function of the mean hill length L h|r i for the road type r i : being L s is the sampling length. Owing to these assumptions, the conditional standard deviation σ Y |r i and the mean hill length L h|r i condense all the information about the road topography.
An important indicator that may be used to qualify a road segment with respect to the topography model is the proportion p y|r i between the length of the road for which the absolute value of the road grade is below a certain specified threshold y, and the total length. As shown again in [50], this may be stated mathematically in terms of a conditional probability as follows: where (·) denotes the CDF of the normal distribution. Equation (10) is valid for a single road type; however, since the ratio p y|r i may be interpreted as a conditional probability, the total probability may be calculated with the aid of (5a). Combining the example in Section II-A with (8), it may be inferred that, if a single road type is considered (n r = 1), the standard deviation σ Y ≡ σ Y |r 1 already incorporates all the information needed to classify a road segment according to both the GTA and UFD systems. For this simplified case, the values of σ Y ≡ σ Y |r 1 marking the transition between the different classes have been calculated in [50]. For a fixed road type, the characteristic parameters σ Y |r i and L h|r i are allowed to vary over the entire population of transport missions. More spefically, in this paper, both parameters are modeled statistically using a lognormal distribution. Denoting with Y |r i andL h|r i the random variables for the road topography standard deviation and mean hill length, Clearly, also in this case, treating the variance of the process as a random variable, the proportion p y|r i in (10) becomes, in turn, a random variable. Figure 4 shows the distributions of both quantities for each road type (namely urban, rural, and highway), together with the fitted lognormal models, whose parameters were estimated from the OCEAN dataset.

2) ROAD CURVINESS
The sOC representation treats the curves along the road as isolated events, modeled using a sequence of locations, curvatures, and lengths {X k , C k , L k } k∈N . The resulting modelreferred to as the curviness of the road -has been introduced in [47], and is inspired from the description proposed in [59]. In particular, for each road type, the locations are assumed to follow a Poisson distribution, i.e., where the intensity λ C|r i ∈ (0, ∞) should be interpreted as the mean number of curves per unit of distance along a certain road type. The curvatures C k is modeled as a modified lognormal distribution as follows: where the parameter r turn ∈ (0, ∞) appears because roads are constructed with a lower bounded radius. In theory, r turn is not a statistical measure, but rather an inherent property of the road type. In this paper, the same value of r turn is used for all the road types. Finally, the lengths of the curve L k are modeled using a lognormal distribution: Two important indicators that may be used to qualify a road based on the curviness parameter, using a reduced number of variables, relate to the expected numbern ′ C|r i of curves per unit of length for which the curvature exceeds a minimum value, and to the proportion of the road p κ|r i for which the curvature is below a threshold κ. By defining with N ′ C a Binomial variable describing the number of curves for which the curvature exceeds a minimum threshold κ, the first criterion may be formalized mathematically as a conditional expectation [50]: where L tot denotes the total length of the road segment. Equation (14) is the same type of relationship prescribed by the UFD system (see Appendix VII-B). In particular, it may be observed that the characteristic parameters µ L|r i and σ L|r i do not appear in (14), implying that two roads which differ for the mean length and variance of the curvature are equivalent according to this criterion. On the other hand, describing a segment in terms of curvature distribution along the road, as done by the GTA system, provides a more general relationship, involving all the characteristic parameters of the model. To this end, the curvature may be regarded as a continuous function K (X ) of the coordinate along the road, interpreted in turn as a stochastic variable. Accordingly, the conditional distribution of the curvature may be derived approximately as follows [50]: Equations (14) and (15) correspond again to a conditional expectation and probability, respectively, and thus the total expectation and probability may be calculated using (5). The resulting formulae would completely qualify a single transport mission with respect to the curviness model. However, all the characteristic parameters also vary over the set of missions defining the transport application. Starting with the curve intensity, it is assumed that, for each individual road type, the corresponding stochastic variable is lognormally distributed, that is, ln The comparison between the measured distributions and the models fitted from the OCEAN dataset is shown in Figure 5.
When the log-radius described by (12) is allowed to vary over the population of missions, the corresponding random variables M C|r i and C|r i are modeled using a normal and lognormal distribution, respectively. In formulae: Figure 6 illustrates the comparison between the measured distribution and the fitted models for both quantities, according to the three road types considered in the paper.
Similar models are also employed to describe the variation of the parameters for the curve length modeled as in (13). Indeed, the following ditributions are assumed: ). The analytical pdf, together with the distribution extracted from the OCEAN dataset, is plotted in Figure 7 for both quantities.
Generally speaking, it may be noticed that the agreement is satisfactory for the variables relating to the urban and rural roads, whereas the data collected on highways exhibit larger discrepancies with the fitted model, and more skewed distribution. The reason for that, which partially legitimates the assumptions on the chosen distribution, should be ascribed to the method applied in estimating the curvature from log data, which relies on measurements of the yaw rate. In this context, the reaction of the driver to very large values of the curvature radius may not be captured by such measurements, implying that small curvatures are automatically filtered out.
It is worth observing that, since all the characteristic parameters of the model are regarded as random variables when considering a population of missions, the quantities n ′ C|r i and p κ|r i in (14) and (15) also become random variables.

3) STOP SIGNS
In the sOC representation, stop signs 4 are treated as independent events and described by the sequence {X k , T s,k } k∈N , where X k is again the location, and T s,k is interpreted as a recommended time. A Poisson model similar to that in (11), but with intensity λ s|r i , is used to model the distance between two consecutive stops. Moreover, for each road type, the recommended time T s,k is supposed to be uniformly distributed between a minimum and maximum value, i.e., T s,k | R t = r i ∼ U(t min|r i , t max|r i ). Usually, the conditional intensities λ s|r i are sufficient to completely qualify a road mission with respect to the stop signs model, whereas the stop times are not involved in the reduced description.
Considering the entire population of missions spanning the transport application, the corresponding random variables for the intensity of the event, and the minimum and maximum times are denoted by s|r i , T min|r i and T max|r i , respectively. As usual, the conditional intensity s|r i is modeled using a lognormal distribution, that is, ln s|r i ∼ N (µ s|r i , σ 2 s|r i ). The minimum and maximum values for the recommended waiting time are instead modeled using a Gamma and modified Gamma distribution, respectively. In particular, it is first assumed that the minimum time is Gamma distributed with shape and rate parameters α T min |r i , β T min |r i ∈ (0, ∞), i.e., T min|r i ∼ Ga(α T min|r i , β T min|r i ). Then, the stochastic model for the maximum time is constructed by adding a positive increment, that is, T max|r i = T min|r i + T |r i , where T |r i ∼ Ga(α T |r i , β T |r i ). This approach ensures that any realization  t max|r i will always be greater than the corresponding lower limit t min|r i . Apart from obvious advantages connected with its support, the Gamma distribution is chosen in this paper for two additional reasons, the first being that it represents a generalization of the exponential distribution, and the second that, if the rate parameters β T min |r i and β T |r i accidentally coincide, then the maximum time T max|r i becomes itself Gamma distributed.
The distributions for the stop signs parameters are compared to the proposed stochastic models in Figures 8 and 9.

4) SPEED SIGNS
Similarly as for the model introduced in Section III-A, for a given road type 5 r k , the speed signs are regarded as piecewise constant, right-side continuous functions of the position [47]. In particular, speed signs are modeled as a random process V = V (x) along with the position on the road. Accordingly, the variable V (x) assumes discrete values in the state space S V |r k = {v 1|r k , . . . , v n v|r k |r k }, where n v|r k denotes the finite number of possible speed limits for the road type r k . Also in this case, the complete model collects a sequence of positions, marked with the corresponding values for the legal speed, i.e., {X k , V k } k∈N . More specifically, the sequence of speed limits is approximated by using a Markov chain [56], [57], and assumes discrete values v i|r k organized into the speed vector v |r k = [v 1|r k . . . v n v|r k |r k ] T .
Accordingly, the entries of the conditional Markov probability matrix P V |r k ∈ R n v|r k ×n v|r k ≥0 fully characterize the discrete chain, with p Vij|r k modeling the conditional probability of transitioning from state i to state j. Since no self-transition are allowed, as usual, they satisfy  n v|r k j̸ =i p Vij|r k = 1, i = 1, . . . , n v|r k . The speed sign locations are again modeled as in (11). For each road type, the n v|r k intensities λ V 1|r k , . . . , λ Vn v|r k |r k may be deduced from the corresponding mean lengths L Vi|r k : collected into a vector The resulting model is completely parametrized by the conditional probabilities p Vij|r k and the n v|r k mean lengths L Vi|r k (or, alternatively, the intensities λ Vi|r k ). Additionally, it should be observed that the speed V (x) itself behaves as continuous-time Markov chain [56], since the distance between consecutive transitions is modeled using a Poisson process. In particular, the stationary distribution π V |r k of the overall process may be derived departing from its generator matrix G V |r k , and satisfies the usual set of equations where the entries g Vij|r k = g Vij|r k (p Vij|r k , L Vi|r k ) of G V |r k are given by Closed-form expressions for π V |r k = π V |r k (P V |r k , L V |r k ) are reported in Appendix VII-C2 for the cases n v|r k = 2 and 3, respectively.  A parameter of fundamental importance for what follows is the conditional expectation of the legal speed over a segement. For a given road type, this may be computed as in [50]: It should be emphasized that, for an individual transport mission, the expected valuev |r k =v |r k (P V |r k , L V |r k ) derived in Eq.(19) may be expressed as a function of the single-step transition probabilities, and mean speed lengths.
Similar to what done for the road type model in Section III-A, for a given road type, each row of the transition matrix P V |r k is assumed to follow a Dirichlet distribution, that is P Vi|r k ∼ Dir(α Vi|r k ), whereP Vi|r k is a vector of random variables for the conditional transition probabilities from state i and for a given road type, and α Vi|r k ∈ R n v|r k −1 >0 a vector of parameters. For ease of notation, the row vectorsP Vi|r k may be organized into the stochastic matrixP V |r k .
As usual, for each road type r k , when interpreted as random variables over the entire population of missions, the mean speed lengthsL Vi|r k are supposed to be lognormally distributed, i.e., lnL Vi|r k ∼ N (µL  Figure 10 illustrates the distributions of the mean speed lenghts for each speed and road type, according to the OCEAN dataset. It should be noticed that parameters for the random variableL V 3|r 3 (corresponding to a speed of 110 km h −1 ) could not be estimated from the available measurements. In fact, the single-step transition matrix P V |r 3 for the highway road reduced to a deterministic two-by-two matrix.
When regarding the single-step transition probabilities and the mean speed lengths as random variables defined over the space of transport missions, the generic stationary distribution vector π V |r k becomes a random vector V |r k (P V |r k ,L V |r k ) collecting the stochastic stationary distributions for the speed limits, conditioned to the road type r k . The analytical expression for the random vector V |r k as a function ofP V |r k andL V |r k may be derived immediately from (17) and (18), by considering the single-step transition probabilities and the mean lengths, and are reported in Appendix VII-C2 for completeness. Accordingly, the conditional mean speed derived as in (19) should also be interpreted as a random variable.
All the stochastic models presented so far, together with the values for the corresponding parameters extracted from the OCEAN dataset, are summarized in Table 3.

C. MISSION LENGTH
The mission length is not technically a road parameter; nonetheless, it plays a fundamental role in the classification of a road transport mission, according to both the GTA and UFD representations. Therefore, it is included in the present paper.
In this context, it should be mentioned that daily driving distances are traditionally modeled as lognormal, Gamma, or Weibull distributions [60]. In the present paper, the focus is not actually on the distance travelled during the entire day, but rather on the individual mission length. For the sake of simplicity, a Gamma variableL m ∼ Ga(αL m , βL m ), assuming values L m ∈ SL m ≡ R >0 , is used. This specific choice is also motivated again by the fact that the sum of Gamma variables with the same rate parameter is still a Gamma variable. For the case of mission length, all the realizations are generated from a unique distribution, and thus any sequence of consecutive missions will also obey the same law. This introduces some freedom in the definition of a mission itself, which could be interpreted as a single trip associated to a specific task, as well as a collection of trips or subtasks.
The comparison between the empirical and analytical distributions is shown in Fig. 11 for αL m = 1.31 and βL m = 0.016 km −1 . Accordingly, the mean value and the variance of the distribution amount approximately to 81.72 km and 4.69 · 10 3 km 2 , respectively.

IV. ANALYSIS AND CLASSIFICATION
The present Section attempts to answer the very fundamental question about the representativeness of the OC, already formulated in Section I-B. More specifically, the basic idea behind the following investigation is to understand what type of usage the parametrized OC is representative of -concerning the road properties and their distributions -and how to exploit this knowledge to design and test more energyefficient vehicles. In this context, the question is formalized in terms of a classification problem and addressed from the perspective of two already existing descriptions, namely the GTA and UFD systems. To this end, new stochastic variables, condensing the salient information about a road transport mission according to the models detailed in Section III, are introduced and analyzed in terms of distributions and moments. These new variables may be deduced directly from the GTA and UFD descriptions, and are henceforth referred to as composite variables, since they may be expressed as a function of the simple random variables introduced before.

A. COMPOSITE RANDOM VARIABLES
For a given mission, the statistical indicators used by the GTA and UFD description target road properties that may be reasonably anticipated to have a substantial impact on energy performance. Considering a whole population of transport missions, these indicators will eventually behave as random variables. Their distribution and moments may be determined, analytically (where and whether possible) or numerically, starting from those of the stochastic variables introduced in Section III.

1) ROAD TOPOGRAPHY
Starting with the model for topography discussed in Section III-B1, the road may be qualified considering two main indicators. The first, as already mentioned in Section III-B1, consists of the proportion between the road length for which the grade is lower than a specified threshold y and the total length of the segment. The second relates to the hill mean length. In the first case, the analytical expression for the new random variable as functions of the conditional random variances Y |r i , i = 1, . . . , n r , may be deduced by combining (5) with (10). The final formula is as follows: where the vector Y ≜ [ Y |r 1 . . . Y |r nr ] T collects the conditional stochastic standard deviations. The relationship for the random mean hill length may be instead derived by noticing that the generic L h|r i can already be interpreted as a VOLUME 11, 2023  conditional expectation, yielding in whichL h ≜ [L h|r 1 . . .L h|r nr ] T is a vector collecting the stochastic mean hill lengths. As already shown previously, both the GTA and UFD classification systems resort to an expression of the same type as in (20), whereas the mean hill lenght is disregarded. 6 Nevertheless, the latter parameter is integrated in the analysis, since the influence of the hill length on vehicular performance may be in general non-negligible, as pointed out in [9].

2) ROAD CURVINESS
For the curviness model, two different relationships may be derived based on (14) and (15) (23), as shown at the bottom of the next page, to (22), it may be clearly noticed that the former relationship is more complete, and involves all the characteristic parameters of the curviness model.

3) STOP SIGNS
With respect to the stop signs model, only the conditional intensities are assumed to play a role in the qualification of the transport mission, whereas the contribution of the minimum and maximum stop times is disregarded. This criterion, relating to the expected number of stops along the road, is present in the UFD classification system, but completely absent in the GTA representation. In particular, since the conditional intensities λ s|r i , i = 1, . . . , n r , may be interpreted as expected number of events per unit of length, the following relationship may be derived directly from the law of total expectation: where it has been defined s ≜ [ s|r 1 . . . s|r nr ] T .

4) SPEED SIGNS
Concerning the distribution of the legal speed along the road, the GTA and UFD propose different criteria, based on the expected number of transitions between speeds and the predominant speed value along the mission, respectively. Whilst the first approach leads to a rather complicated analytical expression that does not involve the values v i|r k of the speed itself, the second one cannot be consistently used to uniquely qualify a road transport mission, as discussed more extensively in [50]. Therefore, in this paper, an alternative criterion, based on the mean legal speed and borrowed from [54], is used. The latter allows for ease of classification and interpretation, since directly relates to the values v i|r k of the speed limits. Owing to the premises above, the analytical expression for the mean speed may be derived starting from (19), and readŝ where the stochastic variables Vj|r i have been introduced in Section III-B4. Equations (20), (21), (22), (23), (24), and (25) provide the analytical expressions for the stochastic variables involved in the classification of an entire transport application. The latter may be conveniently classified by considering the expectation of each variable over the population of road missions. The composite variables, together with their interpretation in terms of expectations or probability, are summarized in Table 4. VOLUME 11, 2023

B. NUMERICAL ANALYSIS AND CLASSIFICATION
The composite variables introduced in Section IV-A permit to easily classify both individual road missions and entire transport applications. In particular, the latter may be qualified depending on the mean value assumed by the quantities in (20), (21), (22), (23), (24), and (25), computed over the whole population of missions (Section IV-B1). This is the approach followed by both the GTA and UFD systems. On the other hand, whilst a single indicator may be sufficient to classify the transport application, more detailed information about the multivariate distribution of the composite variables may be useful for production planning and predictive maintenance (Section IV-B2).

1) CLASSIFICATION USING NUMERICAL EXPECTATIONS
Since the distributions for the simple random variables are known from Section III (Table 3), those for the composite variables may be calculated numerically, or generated by simulation. It should be emphasized that, as opposed to the simple variables, the composite variables are correlated through the stationary probabilities Ri (P R ,L R ), i = 1, . . . , n r , which implies the need to describe an entire transport application using a multivariate random vector, whose definition would, in turn, depend upon the specific choice of classification system. 7 Specifically, taking inspiration from the GTA and UFD representations, respectively, this paper considers the random vectors X GTA and X UFD , 7 It is worth emphasizing that the mission lengthL m is independent of the road models, and also from the composite variables. defined as and where the vector X R s has been defined collecting all the simple stochastic random variables for the models presented in Section III, i.e., X R s ≜ [P RLR y . . .P V |r 1 . . .P V |r nr L V |r 1 . . .L V |r nrL m ] T . In (26), the first three stochastic components of the vector X GTA = X GTA (X R s ) are used to classify the road in respect to the topography parameter (more specifically, the road grade length ratio). Indeed, referring to the example reported in 73286 VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply.
Section II-A, an individual mission is labeled as FLAT if P y (3, Y ,P R ,L R ) > 0.98, P-FLAT if the previous condition is not met but P y (6, Y ,P R ,L R ) > 0.98, HILLY if both the previous criteria are not satisfied but P y (9, Y ,P R ,L R ) > 0.98, and V-HILLY otherwise. In this context, it should be observed that, for a given road transport mission, y 1 ≤ y 2 ⇒ P y (y 1 , Y ,P R ,L R ) ≤ P y (y 2 , Y ,P R ,L R ), which follows from the fact that P y (y, Y ,P R ,L R ) is actually a cumulative distribution function. Therefore, the criterion imposed on the topography according to the GTA classification system is always well-defined. The fourth component of X GTA also relates to the topography model, but targets instead the mean hill length. Finally, the last four components are used to label the mission with respect to the curviness (for which the criterion is expressed again in terms of probability), stop signs, speed signs, and mission length parameters. As opposed to the multivariate random variable X GTA , the stochastic vector X UFD = X UFD (X R s ) includes only one relationship for the road grade length ratio. In fact, according to the example adduced in Section II-A, the label is directly assigned considering the value assumed by the probability P y (2, Y ,P R ,L R ) appearing in (27). The remaining components of X UFD relate to the mean hill length, curviness (for which the condition is formulated in terms of expectation), stop signs, speed signs, and mission length.
The difficulty of analyzing the multivariate random variables X GTA and X UFD resides in the fact that all the composite variables are correlated. Whereas a rigorous treatment may be prohibitive, and out of the scope of the present paper, approximated results may be derived following the method of propagation of uncertainty (the reader may refer to Section VII-D for further details). For the sake of brevity, the discussion is here restricted to considerations about the expected values for the vectors. Indeed, the transport application may be labeled by considering the expected values E(X GTA (X R s )) and E(X UFD (X R s )), respectively. The computation of the mean clearly yields E(X GTA (X R s )) = [E(P y (3, Y ,P R ,L R )) . . . E(L m )] T , and analogously E(X UFD (X R s )) = [E(P y (2, Y ,P R ,L R )) . . . E(L m )] T , which implies that a transport application may be simply classified by determining the marginal distributions and the expectations of each individual random variable contained in the vectors X GTA and X UFD . In this paper, the marginal distributions for the composite variables presented in Section IV-A were estimated numerically by generating a population of 10000 synthetic sOCs (whose stochastic parameters are reported in a separate file). The results of such a process are illustrated in Figure 12, where the marginal PDFs are plotted, along with the estimated mean value (solid grey lines), and the analytical expectation computed using a first-order approximation (dashed yellow lines), as better explained in the following Section IV-C. In Figure 12, the cumulative distribution for the road grade length ratio is only plotted for y = 2 in P y (y, Y ,P R ,L R ), i.e., according to the UFD description. In particular, it is interesting to observe how both the realizations p y and p κ range between zero and one: this could already be anticipated, since the composite variables P y and P κ denote a probability. 8 The mean values of the composite variables are listed in Table 5, where the corresponding GTA and UFD classes are also reported. As a general observation, it should be noticed that the notion of representativeness for the parametrized OC, evaluated in terms of composite variables, is heavily dependent on the choice of the classification system.
As a concluding remark, it is worth observing that using a single road type, i.e., n r = 1, implies that the composite variables are independent, 9 and, thus, the joint PMF becomes the product of the individual PMFs, at least for the vector X UFD (a more exhaustive discussion is reported in Section VII-D). This would enormously simplify the mathematical treatment. 10

2) MULTIVARIATE PMF ESTIMATION
Relying on more detailed information concerning the actual distribution of the road missions, according to some specified classification system, may facilitate the processes of production planning and predictive maintenance. In this context, the joint PMF for a given transport application, opportunely parametrized in terms of sOCs, may be estimated numerically using the relationships established for the composite variables of Section IV-A, and prescribing a set of thresholds on each model in isolation. In this operation, the combinatorial nature of the problem and the resolution of the chosen classification system represent two key factors in determining the total number of possible combinations of classes. For what follows, it is worth emphasizing that estimating the joint PMF for the discrete case is, in fact, different from estimating the continuous multivariate PDF for the composite variables of Section IV-A. Specifically, in this paper, the joint PMF was estimated for the GTA and UFD classification systems considering a population of 10000 sOCs synthesized using the stochastic models detailed in Section III. The theoretical number of possible combinations may be calculated as 2304 for both the GTA and UFD descriptions, whereas the generated sOC yielded a total of 325 and 344 possible combinations, respectively. Using the GTA description, the predominant combination is classified as HIGH for legal speed, P-FLAT for road grade, MEDIUM for mean hill length, LOW for curviness, RESIDENTIAL for stop signs, and L-DISTANCE for mission length, with a probability of occurrence close to 9.8%. Instead, the corresponding labels assigned by the UFD representation are HIGH for speed, 8 In reality, the expression for the generic probability p κ|r i in (15) is approximated (see [50] for further details). Therefore, the extremely rare event of generating negative curviness length ratios might actually happen. By simulation, the probability of such an occurrence was estimated to be around 1/1000. 9 Except for the road grade length ratios in (26). 10 It should be noticed that joint PDFs for the composite variables relating to the curviness model would still be very complicated to determine analytically. VOLUME 11, 2023   FLAT for road grade, MEDIUM for mean hill length, HIGH for curviness, FLUID for stop signs, and L-DISTANCE for mission length, with a maximum probability of 9.4%. Again, the other probability values are reported in a separate file, and not discussed in this paper for the sake of brevity.
Instead, Figure 13 illustrates some joint PMFs obtained by only considering two variables at a time, that is, marginalizing over all the remaining dimensions of the problem. The resulting bivariate distributions are easier to visualize, given the reduced number of involved parameters, and may be used to elucidate some interesting differences between the 73288 VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply. two classification systems. Starting with the first set of histograms in Figure 13, showing the PMF for the mission length and road grade, most of the missions, representing nearly 41.9% of the total, are labeled as L-DISTANCE/P-FLAT using the GTA criteria, with the second most populated combination of classes being the REGIONAL/P-FLAT, covering an additional 27.7%. On the other hand, the UFD description collocates the majority of the missions in the L-DISTANCE/HILLY and L-DISTANCE/FLAT combinations (27.6 and 24.6%, respectively). In both cases, the combinations with STOP&GO are extremely rare. Similar proportions could already be expected from the analyses preliminarily conducted in Sections IV-B1 and IV-C, and align with the results obtained by considering the individual expectations of the composite variables. Generally speaking, it may also be observed that, excluding the P-FLAT class, the missions are more homogeneously distributed according to the UFD representation. This may be explained by noticing that the UFD system is actually deficient in the P-FLAT class. As a consequence, the missions labeled P-FLAT by the GTA description are divided almost equally between the FLAT and HILLY classes. Moreover, some of the missions qualified as HILLY using the GTA metrics belong instead to the V-HILLY class according to the UFD representation. A similar effect, also due to the different resolution of the two classification systems, may be observed by looking at the bivariate PMF of the mission length and legal speed parameters. Indeed, with respect to the latter, the majority of the missions is labeled either MODERATE or HIGH using the thresholds prescribed by the GTA description, which misses the V-HIGH class. The UFD representation also distributes the transport missions mainly between the MODERATE and HIGH classes, but with an opposite tendency. The predominant combinations are the L-DISTANCE/MODERATE for the GTA system and L-DISTANCE/HIGH for the UFD (41.5 and 43.0% of the total number of missions, respectively). Finally, concerning the bivariate PMF for the mission length and curviness parameters, the resolution is the same for both classification systems. In this case, the substantial discrepancy between the two distributions should be ascribed to the different criteria used: whilst the GTA imposes a limit on the portion of the road for which the curvature exceeds a certain value, the UFD prescribes a sequence of thresholds on the expected number of curves, without specifying anything about the covered length. As a result, the predominant combinations are the L-DISTANCE/LOW for the GTA description (40.4%) and L-DISTANCE/HIGH for the UFD representation (38.0%). In any case, the general consideration is again as in Section IV-B1: the representativeness of the OC should be evaluated case-by-case according to the specific choice of the classification system. Additional details about the distributions plotted in Figure 13 are provided in Table 6.

C. BUILDING A SINGLE REPRESENTATIVE sOC
A single representative sOC may be derived that embodies the salient properties of the entire transport application, using a reduced number of parameters. The intuitive approach proposed in this paper is to consider the sOC resulting from selecting the mean values of the sOC parameters, interpreted as random variables over the population of missions. Accordingly, the resulting reference sOC may be parametrized using the vector-valued mean µ X Rs There are several reasons that legitimate such a choice. The first is that such a parametrized sOC would obviously be independent of the assumed classification system. In fact, if a representative sOC were constructed starting from the composite variables introduced in Section IV-A, it would be inherently and heavily conditioned by the structure of a specific bird'seye-view description, which may however vary amongst vehicle manufacturers and road operators. Therefore, its practical relevance would be potentially limited to some specific application. Besides, it should be observed that the composite variables of Section IV-A incorporate many different sOC parameters. Therefore, defining a single reference sOC in terms of composite variables would ultimately determine the feasible combinations of sOC parameters, but would not provide their value explicitly. This aspect inherently relates to the non-bijective relationship between the different levels of the OC description. Finally, there is another advantage, more theoretical in nature, connected with constructing a representative sOC according to the outlined approach: the realizations for the composite variables may be interpreted as a zeroth-order approximation of their expected values over the population of road missions. 11 This last consideration allows to easily evaluate the representativeness of the reference sOC also in terms of a specific bird's-eye view description.
As an example, the expectations determined analytically using the zeroth-order approximation are plotted in Figure 12 (dashed yellow lines) for all the composite variables considered in this paper. Accordingly, the dashed yellow lines provide an intuitive understanding of how the reference sOC is collocated in the distribution of possible missions, and how distant it is from the mean sOC described in terms of composite variables. By looking at Figure 12, it may be generally concluded that, for the specific set of sOC parameters under consideration, the match between the two sOCs is quite satisfactory. Indeed, excluding the number of stops, the approximated expectations exhibit negligible relative errors (usually below 5%) compared to those estimated numerically. Moreover, classifying the transport application using the approximated values for the expectations yields the same combination of classes as in Section IV-B, the unique exception concerning the stop signs, for which the label FLUID is assigned.

V. APPLICATIONS
The present Section discusses some practical applications of the whole OC edifice. The examples adduced in the following should be intended as illustrative, and mainly serve the purpose of demonstrating the potential of the OC format. Two problems, of different nature, are considered in this paper: the first concerns accurate simulation of vehicular performance, in the contexts of certification or early design; the second deals with the process of optimal production planning.

A. ASSESSING VARIATION IN ENERGY PERFORMANCE
Departing from a fully parametrized transport application, vehicular performance may be easily assessed in a virtual environment by combining the OC with vehicle and driver models. If a detailed study is required, a large population of road missions may be generated directly by simulating (in a statistical sense) the stochastic models introduced in Section III, permitting to accurately represent variation in usage and road characteristics. This type of analysis would be more suited when referring to a certain usage in terms of energy efficiency, relating for example to a specific geographical area. Moreover, since the stochastic parameters are allowed to vary over the population of missions that define the transport application, the influence of different road parameters -including, e.g., the composite variables  introduced in Section IV-A -may be investigated comprehensively. Such an investigation is deliberately omitted from the present paper, but may be carried out as done in [47] and [49].
On the other hand, considering a single reference sOC may be sufficient when dealing with certification and preliminary assessments of new vehicle designs. In this context, it should be clarified that a unique sOC may still describe variation in usage, and actually, multiple dOCs need to be synthesized starting from the same sOC to accurately predict spread in performance. Figure 14 compares the CO 2 emissions estimated by generating and then simulating multiple dOCs in the VehProp environment (the reader may refer to Appendix VII-E for further details about the implementation and computational aspects). More specifically, the blue histogram shows the distribution of pollutant emissions obtained by simulating VOLUME 11, 2023 73291 Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply. 10000 dOCs, randomly synthesized from the same population of 10000 sOCs as in Section IV. On the contrary, the orange histogram refers to a population of 300 12 dOCs generated from the same reference sOC parametrized in Section IV-C. In both cases, the missions were mirrored to ensure a balance in the potential (gravitational) energy. Coherently with the scope of the paper, Figure 14 has been produced considering a Volvo FH13 vehicle model, equipped with a diesel engine, an actuated, stepped gearbox with 12 forward gears and a kerb weight of 7540 kg. These specifications are, in fact, typical of heavy-duty trucks of the kind commonly used for longhaul missions. Additional information about the vehicle's configuration can be found in the Volvo datasheets. 13 Along with the actual distributions of CO 2 emissions, the mean values are also reported in Figure 14, evaluated concerning both the entire population of stochastically generated sOCs (solid grey line), and the representative sOC parametrized in Section IV-C (dashed yellow line). The values computed numerically amount to 1708 and 1729 g km −1 , respectively. The difference between the two estimates is negligible, which may be explained informally as follows. First, the existence of a function f CO 2 (·) may be postulated describing the mean CO 2 emissions calculated over a population of dOCs, synthesized from the same sOC. In general, the analytical expression for such a function may not be known a priori but, for a fixed set of sOC parameters, it would be deterministic, and dependent solely upon the sOC parameters, the vehicle's specifications, and possibly the driver's behaviour. Then, treating the sOC parameters as random variables, and with a similar rationale as in Section IV-C, its expectation over the population of missions may be approximated as E(f CO 2 (X R s )) ≈ f CO 2 (E(X R s )) ≡ f CO 2 (µ X Rs ). This result, which may perhaps appear trivial, asserts that the average emissions may be estimated simply by simulating an adequately large number of dOCs generated from the representative sOC, parametrized by the vector-valued mean parameters µ X Rs . This observation allows to considerably reduce the number of needed simulations, compared to synthesizing a larger population of missions that explicitly consider the variation of sOC parameters. Moreover, since the above approximation is valid for any other function, it may be conjectured that, in order to estimate the mean value of a quantity in interest, simulating a number of dOCs originating from the representative sOC defined as in Section IV-C should always be sufficient. Clearly, it is quite evident from Figure 14 that a similar argument does not apply concerning the variances and standard deviations. Indeed, accounting for variation between road transport missions, which mathematically translates into the variation of the sOC parameters, produces a much larger spread in performance than considering a single representative sOC. In fact, the standard 12 This number of dOCs was empirically found already to yield sufficiently good estimates for the mean energy consumption and CO 2 emissions in [47] and [49]. 13 https://www.volvotrucks.se/sv-se/trucks/trucks/volvo-fh16/ specifications/data-sheets.html  deviations estimated from simulating the complete transport application and the reference sOC amount to 360 and 117 g km −1 , respectively.

B. OPTIMAL SELECTION AND PRODUCTION PLANNING
The second example discussed in this Section elucidates the application of the OC format for the purposes of optimal design selection and production planning. The idea is to exploit the information about the distribution of the road transport missions, concerning, e.g., a specific geographical area, to estimate the proportion of vehicles to be delivered to that market. The assessment would clearly be only preliminary but may provide important indications on the required volumes, with obvious advantages in the planning for the setting of the production lines. For example, Volvo aims at replacing its current fleet of trucks with battery-electric vehicles (BEVs), integrated with automated driving systems, by 2040. In this context, the sOC parametrized in this paper may be thought of as representative of the usage for the region of Västra Götaland, concerning heavy-duty rigid and tractorsemitrailer vehicles.
By labeling the missions based on the internal GTA description, an optimal configuration may thus be found using ad-hoc algorithms for every combination of classes. Typically, this operation would involve the selection of a number of representative sOCs, the synthesis of dOCs, and their simulation in virtual environments, in conjunction with appropriate models for vehicle dynamics. Once the design has been optimized, the proportions of vehicles to be produced may subsequentially be deduced from the distribution of missions, provided that the total volume has been estimated in advance, based, for example, on already placed orders or historical data. The spread in performance within the entire transport application may then be evaluated a posteriori, as done before in Section V-A. To simplify the problem, this paper only considers three parameters, namely the mission length, road grade, mean legal speed, and two vehicle configurations, 14 that is, rigid truck (RT) and tractor-semitrailer (TS). These are the same vehicle types as those in Appendix VII-A. Figure 15 illustrates the actual distribution of missions for the same population of 10000 road operations synthesized in Section IV. Owing to the premises above, optimal solutions for each combination of classes have been derived in [54] and [61] using a particle swarm optimization algorithm. Concerning the predominant combinations of classes, the corresponding proportions are listed in Table 8 for both variants, along with the number and type of electric motors (EMs), and battery packs (BPs). The interpretation for the EM and BP types in Table 8 is the same as in [54]. Exhaustive information about the optimal configurations and propulsion systems for the considered combinations of classes may be found in [61], where the results of a comprehensive sensitivity analysis are also presented.

VI. DISCUSSION
The present paper has constituted a first attempt at a complete parametrization -concerning the road models -of an entire transport application, within the theoretical framework provided by the OC representation. The contribution is articulated in three main moments. The first coincides with the development of a set of stochastic models to describe the variation in usage amongst the road missions. The focus is on heavy-duty trucks, traveling long distances in the region of Västra Götaland, Sweden. The second is represented by the numerical analysis of the resulting OC, and its classification, according to two other simplified descriptions, namely the GTA and UFD systems. Finally, the third contribution resides in the exemplification of the potential of the OC format, relatively to the processes of vehicle design optimization, virtual testing, and production.
The present Section pauses to reflect upon the fundamental questions raised by the manuscript, along with their contextual interpretations and implications.
In this paper, an OC has been developed and parametrized limited to long-haul heavy-duty vehicles. Concerning the parametrization process, data have been collected from trucks operating in the region of Västra Götaland, combining available measurements with ad-hoc estimation techniques. Moreover, weather and traffic models have not been considered in the present work, mainly because of the difficulties connected with the estimation of the related quantities. However, a complete OC description should opportunely be integrated with information regarding also weather and traffic, similar to what was done in [49]. Also, road parameters may be more accurately parametrized by collecting information from open-source databases, like OpenStreetMap, 15 in conjunction with GPS measurements. This possibility has been deliberately disregarded in the paper. Generally, it should be acknowledged that the proposed OC is only representative of a certain usage -i.e., that of heavy trucks traveling long distances -, whereas extensions may always be possible, as corroborated by the flourishing research dealing with the synthesis of conventional driving cycles.
In the very same context, it is worth recalling some observations from the introduction of this paper, to better highlight the advantages of the OC over the conventional driving cycle representation. More specifically, it appears that the OC format currently qualifies the only description capable of satisfactorily addressing all the three problems enounced in Section I, namely the representation, variation, and classification problems. Indeed, concerning the micro-trip and pattern classification methods [23], [26], [27], [28], [29], [30], [31], [34], [35], [36], [37], it is clear that the variation and classification aspects cannot be covered, since details about the operating environment are usually not included in the generated driving cycles. On the other hand, whereas some of the techniques listed in Table 1 -in particular, the modal construction and segment-based approaches [4], [18], [24], [25], [32], [33], [38] -permit the synthesis of driving cycles that account for the effect from the surroundings, these do not respond to the need for an easily interpretable description of the vehicle's usage. In fact, the influence of external factors like road grade, traffic density, and weather conditions is considered only implicitly by such driving cycles. In turn, this implies the impossibility of classifying a road transport mission based on the characteristics of the environment where it takes place. Moreover, even the segmentation and modal construction methods, which qualify perhaps amongst the most sophisticated techniques, do not allow for stochastic generation of equivalent driving cycles. As observed in [32], [33], and [39], this aspect becomes crucial when assessing variation in performance caused by fluctuations in the operating conditions, whilst also being a fundamental prerequisite to ensure robustness. From this perspective, whilst the approach pioneered in [32] and [33] seems to have addressed both the representation and variation problems adequately, again it does not seem compatible with the necessity for the classification of road missions.
Therefore, in spite of the present limitations, from a conceptual perspective, claiming that the OC format has come to maturity does not seem an incautious statement. The stochastic models introduced in this paper, in particular, have been motivated, wherever possible, based on their implications and ease of interpretation. In all the other cases, the principle of parsimony has been invoked, in line with the philosophy behind the whole OC machinery. It is also in the authors' intention to clarify that the models proposed here are nothing definitive, and other formulations may work equally well or even better on different datasets. With similar arguments, the format itself is not necessarily complete, and may always be improved or modified to accommodate needs that have not been identified so far. Whilst the possibility of building an allembracing theory for the OC description might undoubtedly be seductive, such a colossal venture is not attempted in this paper. Besides, modularity and replaceability have been inspiring qualities since the very early conception of the OC framework.

B. ON THE NOTION OF REPRESENTATIVENESS, AND ITS IMPLICATIONS
Since an OC describes a road transport mission directly concerning the operating environment, in this paper, the notion of representativeness has been discussed in relation to the characteristics of the surroundings. The fundamental question motivating the investigation has been formulated in the introductory Section I-B and phrased as: ''What type of usage is a certain OC representative of?''. For the problem under consideration, the interrogation should be clearly interpreted in the sense of road characteristics and mission length. In this paper, an answer to the previous question has been sought by considering two existing bird's-eye view descriptions: the GTA and UFD classification systems developed by Volvo and Scania. The differences and similarities between these two representations have highlighted the circumstantial nature of the notion of representativeness, and perhaps the need for a unified approach.
Adducing again the example of Volvo and Scania, the construction of their classification systems was presumably instrumental to the processes of optimal design and selection of energy-efficient vehicles. In this context, the notion of representativeness is clearly evaluated based on an underlying metric, that is, the energy performance of road vehicles. Even so, the GTA and UFD labeled the OC parametrized in this paper differently. In Sections IV-B1 and IV-B2, this discrepancy has been ascribed to the different criteria specified by both descriptions concerning, for example, road grade and curvature. In this context, the disagreement between the two representations may reveal particularly problematic when considering the interaction with other entities (for example, for certification and urban planning purposes [62]). The wish is that vehicle manufacturers and governmental agencies could team to concord on the building of a unique description that would allow no room for misunderstanding. Scientists and academic scholars should also be aware of such a delicate aspect, and could actively participate in the process.
Such considerations about the notion of representativeness are not restricted to the scope of the operating cycle. Transient driving cycles should also be developed considering the internal systems adopted by vehicle manufacturers, to facilitate the integration with their development and production processes.

C. CERTIFICATION AND PERFORMANCE EVALUATION
In this paper, two methods to assess the energy consumption and pollutant emissions of road vehicles have been investigated, based on the OC description. The first includes detailed information about the distribution of road missions, accounting for variation occurring over the entire population of transport operations. The second is based on a simplified approach and only considers a single representative sOC, selected opportunely and independently of the chosen classification system.
A specific method may be more suited for a certain application. For example, concerning the certification process, the simplified approach could certainly replace the conventional representation in terms of a driving cycle, if the focus is on mean quantities. This conclusion was supported in Section V by both the numerical and informal mathematical analyses conducted with respect to the pollutant emissions. Moreover, since a single sOC would still be able to reproduce variation in performance, information about the expected energy consumption and CO 2 emissions could be integrated more conveniently with additional indicators, e.g., variance or standard deviation. However, considering the entire transport application would yield more accurate results in this case.
In this context, it should be observed that the same result may also be achieved by considering the stochastic driving cycles proposed in [32], [33], and [39].
On the other hand, vehicle manufacturers may prefer testing vehicles considering the actual distribution of sOCs within a transport application, especially if log data are collected and stored in internal databases. To ensure consistency between the two approaches, however, a fundamental prerequisite would be the absence of any ambiguity in the translation between the corresponding descriptions. This happens again to relate to the notion of representativeness discussed previously.

D. CHALLENGES AND ADVANTAGES RELATING TO OC-BASED VEHICLE DESIGN OPTIMIZATION, PRODUCTION PLANNING, AND PREDICTIVE MAINTENANCE
The aspect of vehicle design optimization, albeit mentioned in the paper, has not been explored fully in the present study. In this context, compared to employing the conventional description in terms of driving cycles, the process of design optimization conducted using the OC format would be intuitively more robust, since it would explicitly account for variation in the operating conditions. Considering only a few missions, the computational effort would also be comparable to that of resorting to a common driving cycle (see discussion in [54]), with the additional advantage of the OC being a more accurate and reliable description of vehicle usage [63]. However, it is worth mentioning that the need for increasing robustness would imply simulating a larger population of dOCs. With these premises, a main drawback connected with using a large number of dOCs certainly consists in the fact that several road transport missions would need to be simulated whilst running an optimization routine, with high computational cost. Whereas this may be prohibitive with the current technologies, the imminent revolution of quantum computing could enable fast optimization in the next few years, possibly rendering the conventional driving-cyclebased approaches obsolete.
Concerning instead production planning and predictive maintenance, the problem is much simpler, since simulating a large number of dOCs is not particularly expensive. 16 Therefore, once the number of possible configurations is set, applying similar rationales as those discussed in the present paper becomes relatively straightforward when it comes to production planning. Analogously, predictive maintenance and control algorithms may be easily run and tested using the OC description, as done again in [54], where the number of stochastic models was however limited compared to the present study. 16 Simulating a single dOC of 250 km takes around 1 min on a personal computer.

VII. CONCLUSION
In this paper, an OC for long-haul heavy-duty vehicles, complete with stochastic road models, has been developed using log data collected from trucks operating in the region of Västra Götaland, Sweden. The proposed description allows for the representation of energy usage in a realistic manner and captures variation between road transport missions by resorting to a statistical approach.
The representativeness of the fully parametrized OC has been evaluated and discussed concerning two existing classification systems in use by Volvo and Scania, respectively. According to both descriptions, the considered transport application is labeled as long distance, but other parameters are judged differently. A methodology has also been proposed to synthesize a single representative sOC, described in terms of mean parameters, starting from the complete population of road missions.
As an application of the developed OC, two different examples have been adduced in the paper. The first was aimed at assessing energy efficiency in simulation, considering both the actual distribution of road missions determining the application and the mean representative operation. Concerning specifically the Volvo FH13 employed in the study, it was demonstrated in simulation that the mean CO 2 emissions computed numerically according to the two different approaches were similar (amounting to 1708 and 1729 g km −1 , respectively), whereas a large discrepancy could be observed with respect to the predicted values for the standard deviation (360 and 117 g km −1 ). The second example dealt instead with the process of optimal design and selection for future battery-electric vehicles, depending on the characteristics of the intended usage. The optimal proportion was characterized by a predominance of tractorsemitrailer trucks (70%), equipping 4 motors and 11 battery packs.
the authors read and approved the original version of the manuscript.

A. THE OCEAN DATASET
The stochastic models presented in this paper were parametrized using the OCEAN database. The database was created as part of the OCEAN project, and contains data from 33 different long-haul heavy-vehicle configurations, mainly operating in the region of Västra Götaland, Sweden. The properties of each truck, including its identifier, maximum torque and axle configuration are listed in Table 9.
The database is organized into different directories. Each directory contains the data corresponding to every single truck (A1, A2, . . . , D2). In each folder, the data are organized by log files, which correspond to the routes traveled by the trucks. In total, the database contains 1872 log files, each counting 34 signals, including GPS coordinates, yaw rate measurements, brake pedal position, et cetera. The number of files is quite heterogeneously distributed amongst the different vehicles and covers a period of time corresponding to a typical quarter.
The quantities required to fit the models presented in Section III were estimated from the available measurements using different techniques. For example, whilst the total distance could be deduced immediately from log data, the topography parameters needed to be recovered using statistical tools, like the Wafo package implemented in [64] and [65]. The road curviness was also inferred from the yaw rate, by postulating simple equilibrium equations relating the local curvature to the acceleration. Finally, the legal speed was estimated by filtering out the influence of the previous parameters from the resulting speed profiles, using ad-hoc algorithms. A viable alternative would be to retrive the information about the speed signs directly from an open source database, like OpenStreetMap, using GPS signals.

B. THE GTA AND UFD CLASSIFICATION SYSTEMS
The GTA and UFD classification systems developed by Volvo and Scania are conceived as high-level representations targeting individual road missions, as well as entire transport applications. They have been refined over the years during an iterative process, and are intended to facilitate the interaction with the customer during the selection and sales stages. To this end, they describe the operating environments using colloquial tones and statements, which may however be reformulated in terms of statistical indicators, including mathematical expectations and probabilities. Imposing limits and thresholds on these, the GTA and UFD systems build a discrete representation of the usage. Consequently, a road transport mission may be qualified by resorting to a countable number of labels, each of them relating to a specific operating class. More specifically, the formalization of both descriptions has been worked out in [50], where the analytical expressions for the operating classes have been deduced for all the road models presented in this paper.
For the sake of brevity, the original formulation of each operating class is not discussed in this paper, whereas Table 10 lists the classes specified by both the GTA and UFD representations in mathematical form, with respect to the different stochastic models discussed in the paper. The quantities appearing in Table 10 represent the realizations of the composite variables introduced in Section IV-A, with an obvious convention for the notation. It is worth mentioning that the original classification systems are actually deficient in the mean hill length parameter and, concerning the topography, completely qualify a mission depending on the value of the road grade length ratio. Moreover, the GTA description does not include any indication about the mean speed. However, in this paper, the latter parameters have been added to allow for a fair comparison with the UFD system. In this context, the values for the speed limits have been inspired by the stochastic model for the road type. Finally, the value κ = 0.008 m −1 appearing in both the expressions for the expected number of curves and the curviness length ratio corresponds to a speed reduction of nearly 20% when driving at 70 km h −1 [50].

C. ANALYTICAL EXPRESSIONS FOR THE STATIONARY DISTRIBUTIONS OF ROAD TYPES AND SPEED SIGNS
This Section provides the analytical expressions for the stationary distributions for road types and speed signs, together with the corresponding random versions.

1) STATIONARY DISTRIBUTIONS FOR ROAD TYPES
Starting with (3a) and (4), for n r = 3, the stationary distribution π R = π R (P R , L R ) may be derived in components as in (28), shown at the bottom of the next page. From (28), the stochastic stationary distribution R = R (P R ,L R ), defined over the entire population of missions, is thus given by, as in (29), shown at the bottom of the next page.

2) STATIONARY DISTRIBUTIONS FOR SPEED SIGNS
For a given road type r k , the stationary distribution π V |r k for the speed signs may be found from (17) and (18). For n r = 3, the stationary distribution π R = π R (P R , L R ) may be derived in components as in (30), shown at the bottom of the next page, and for n v|r k = 2. Accordingly, the stochastic counterparts read as in (32), as shown at the bottom of page 31, and

D. APPROXIMATED ANALYSIS FOR VARIANCE
The correlation between the composite variables introduced in Section IV, may be studied by resorting to approximated analytical approaches. In particular, using the propagation of uncertainty technique, the covariance matrix X GTA X GTA for the GTA vector X GTA = X GTA (X R s ) of composite random π R2 (P R , L R ) = L R2 (p R12 p R31 + p R32 ) L R1 (p R21 p R32 + p R31 ) + L R2 (p R12 p R31 + p R32 ) + L R3 (1 − p R12 p R21 ) , π R3 (P R , L R ) = L R3 (1 − p R12 p R21 ) L R1 (p R21 p R32 + p R31 ) + L R2 (p R12 p R31 + p R32 ) + L R3 (1 − p R12 p R21 ) . (28c) π V 3|r k P V |r k , L V |r k = L V 3|r k 1 − p V 12|r k p V 21|r k L V 1|r k p V 21|r k p V 32|r k + p V 31|r k (30c) VOLUME 11, 2023  Operating classes according to the GTA and UFD classification system. The limits on the mean hill length L h are expressed in m; those for the expected number of curves and stop signsn ′ C andn s , respectively, in km −1 ; the thresholds on the mean legal speedv are prescribed in km h −1 ; finally, the limits on the mission length L m are specified in km.
variables may be deduced approximately as where J X GTA is the Jacobian of the vector function X GTA (·) with respect to the set of simple sOC random variables X R s , and X Rs X Rs denotes the covariance matrix of the simple variables. The expression for the UFD system is analogous with a similar notation for the Jacobian matrix. It should be observed that, whilst the Jacobians appearing in (34) and (35) are different, the matrix X Rs X Rs is the same for both relationships. In particular, the matrix X Rs X Rs may be put in diagonal form as in (36), shown at the bottom of the next page, where all the matrices are diagonal by model construction, except those for the stochastic transition matrices for the road types and conditional speed signs, which may be in turn decomposed in diagonal form as (32c) FIGURE 16. Generation process of a deterministic operating cycle (dOC) from a stochastic one (sOC). All the secondary models are generated stochastically departing from the sequences of primary ones (in this paper, only the road properties are considered). A conversion is needed between the sOC and dOC formalisms.
According to (36), (37) and (38), only the stochastic variables relating to the same Dirichlet distributions are correlated. On the other hand, the composite variables are correlated via the stationary distributions for the road types. In (34) and (34), this correlation is accounted for by the Jacobian matrices J X GTA and J X UFD , respectively. It should be remarked that assuming a single road type r 1 implies that the composite variables in the UFD vector X UFD are all independent, and thus uncorrelated. This happens because none of the original simple random variables appears in more than one scalar function in (27). The same is not true for the GTA vector X GTA , since its first three components would all be functions of the stochastic standard deviation Y ≡ Y |r 1 .

E. IMPLEMENTATION AND COMPUTATIONAL DETAILS
The present Section describes the practical implementation of a dOC, starting from the corresponding sOC description, along with details concerning the simulation of longitudinal vehicle dynamics in VehProp. In particular, Figure 16 is a schematic illustration of the typical workflow needed to synthesize a reference dOC starting from an equivalent sOC. First, the primary models are generated over a specific mis-  VOLUME 11, 2023 sion distance, which may either be prescribed or simulated using an opportune distribution, as done in the present paper. The primary models are also simulated simultaneously, since they do not interact explicitly, owing to the principles of independence and parsimony upon which the OC is built. The secondary sOC models are then derived from the primary ones. More specifically, the simulation of road properties may be carried out using the ad-hoc WAFO package implemented in MATLAB ® [64], [65]. The sequences obtained using this procedure need to be converted into the dOC formalism. For example, curvature and topography are translated into curviness and altitude. 17 From the signed curvature, the actual road profile and the tangent vector to the trajectory are also deduced numerically using Fresnel integrals [49]. The dOC parameters, plus their location in either space or time (or both for the traffic density), are then encoded in the dOC description and tabulated. Finally, a longitudinal vehicle model is simulated in the VehProp environment, consisting of a standard set of equations implemented in MATLAB/Simulink ® , similar to those that may be found in any reference textbook [7]. At each time step, the dOC parameters are used as input to the governing equations of motion, and intermediate values are calculated by using a suitable interpolation law (as explained more extensively in [9]). Concerning a dOC of 250 km, a simulation in VehProp typically takes 1 min on a standard personal computer to be completed.