A Logit Mixture Model Estimating the Heterogeneous Mode Choice Preferences of Shippers Based on Aggregate Data

Understanding the modal split in freight transportation is a key factor for the successful implementation of innovations. Mode choice models should then be as representative of reality as possible. The use of disaggregate shipment data can help to achieve it. However, shipment data are often unavailable due to confidentiality issues. As a result, numerous models using only aggregate data have been developed, but their capacity to capture heterogeneity in preferences remains limited. In this paper, we propose a Weighted Logit Mixture model to estimate heterogeneous mode choice preferences of shippers directly from aggregate data. The proposed Weighted Logit Mixture is applied to a case study along the European Rhine-Alpine corridor and allows to estimate the probability distribution of the cost sensitivity among the population. The estimation results show that there exists a substantial variation of the cost sensitivity regarding intermodal transport. The proposed methodology is also compared to a state-of-the-art Weighted Logit model to assess its potential. This reveals that the proposed Weighted Logit Mixture exhibits at least a similar predictive power to the benchmark while achieving a better description of the population’s preferences that enables policy-makers to take better informed decisions and appropriate actions.


I. INTRODUCTION
I NTERNATIONAL freight transport plays a significant role in the worldwide CO 2 emissions. Its share has been estimated to be more than 7% of global emissions in 2015 [1]. Regarding land transport, the road is by far the most used modality. In Europe, freight transport on the road represented more than 70% of the tonnes-kilometers traveled in 2018 [2]. Therefore, modal shift to rail or water freight transport is a key objective of the European Green New Deal to move toward sustainable mobility [3], [4]. Beside the use of new policies or regulations, the attractiveness of waterborne and rail transport can be enhanced through innovations, e.g., smart navigation or coordinated lock scheduling. These can address several aspects of the transport, such as cost reduction or time savings. In order to take appropriate action, it The review of this article was arranged by Associate Editor Edwin van Hassel.
is crucial to accurately represent and understand the modal split, its drivers and its potential evolution. Since the 1980s, various freight mode choice models have been developed following similar methodologies as for passenger transport [5]. The outcome of these models is typically the probability for choosing a given alternative to ship a good from origin to destination. A so-called alternative can consist of a single mode but can also be more complex, e.g., a combination of modes, a mode chain or a specific route.
Depending on the scale observed, aggregate and disaggregate models are differentiated. Aggregate models refer to situations where the Origin-Destination (OD) flows of cargo between regions are observed, whereas disaggregate models make use of shipment data [6]. The latter then present a greater level of detail and depict better the preferences of the decision-maker [7], however shipment data are often difficult to acquire due to their commercially-sensitive nature [8]. Furthermore, an international scope necessitates that companies active in different regions share their data with researchers and that the number and variety of firms are sufficient to be representative for the whole population. This requires a laborious data collection process, with no guarantee of success.
On the other hand, aggregate models make sense in an international freight transport context since modal share is strongly influenced by the geography and the commodity mix [9]. One can reasonably assume that firms belonging to the same industry sector with identical available transport infrastructure and services will exhibit similar behaviors. Therefore, OD flows between regions, especially when segmented into commodity types, are considered to be representative for the whole population [10]. However, there remains underlying heterogeneity since it is impossible to observe all factors influencing the mode choice process, all the more with aggregate data.
The main contribution of this work is to consider heterogeneity explicitly in the aggregate mode choice model without the need of disaggregate shipment data or additional data handling. Instead of assuming that the same behavior is shared by the whole population, we allow the preferences to be randomly distributed. Therefore, the inherent heterogeneity is taken into account in the modal share estimation process. Moreover, the methodology is applied to real-world data along the European multimodal Rhine-Alpine corridor.
This paper is structured as follows: in Section II, a literature review is provided, after which we describe the model and its characteristics in Section III. The proposed methodology is then applied to a concrete case study, introduced in Section IV and we present the main results in Section V. Finally, conclusions and further research directions are provided in Section VI.

II. RELATED WORK
This section gives an overview of the existing freight mode choice models, focusing on aggregate models that have been applied to a real-world situation. For a thorough review of freight mode choice models, the reader is referred to the following works [5], [8], [11].

A. AGGREGATE MODE CHOICE MODELS
At early stages, mode choice was estimated through regression models based on cost and demand functions for freight transport [12]. Several optimization models have been developed later to assign the flows to their corresponding mode and route in the freight transport network [13], [14], [15], [16], [17]. They aim at minimizing the costs and are solved employing shortest path algorithms. This cost minimization can also be used as a control rule within a freight transport network simulation [18], [19]. But the most prevalent model to estimate the mode choice in freight transport is the Multinomial Logit (MNL), or one of its variations [8].
The MNL is based on the Random Utility Maximization (RUM) principle applied to the context of discrete choice [20], [21]. Although designed for disaggregate models, this methodology can also be applied to aggregate mode choice models [5], [10]. Some studies estimate their model directly with OD flows, while others proceed to a disaggregation of the flows before applying the model. The latter can be seen as a hybrid technique: the mode choice model is generally estimated with disaggregate data (through a survey of shippers or a Commodity Flow Survey, like the one gathered by the US Bureau of Transportation Statistics [22]), then the model can be used for modal share estimation by disaggregating the OD flows into shipment inputs. Zhang et al. [23] use a shippers' survey to estimate the coefficients of a binary Logit model. The two considered alternatives are truck only and intermodal transport. To compute modal shares with the estimated model, the aggregate freight flows in tons are then decomposed into smaller units, such as twenty-foot equivalent units (TEU), using a predetermined distribution of the weight per TEU. The Aggregate-Disaggregate-Aggregate (ADA) methodology [24], proposed by Ben-Akiva and de Jong, pushes the concept further by converting zone-to-zone flows into firm-to-firm flows and combining transport chain choice with other logistics decisions (e.g., shipment size, type of loading unit). The choice model itself is estimated on a Commodity Flow Survey with generalized costs as a "disutility" function. The authors mention that the estimation of the model is also feasible with only OD data: it can be achieved by setting these data as targets and iteratively calibrating the parameters until the model's output is close enough to the targets.
The aforementioned "hybrid models" between aggregate and disaggregate models present the advantage that they are estimated with data from real decision-makers (shippers, firms). The models are thus perfectly consistent with the RUM theory, as they compute the (dis)utility of concrete individuals. Nevertheless, only aggregate data are available when the models are used for forecasting: shipment surveys are indeed not available for each year and every region. Data at the firm or shipment level are then produced using predefined probability distributions.
Other authors use directly the available data (OD flows) to estimate their mode choice models. From a theoretical point of view, this is more debatable since data do not relate to a concrete agent capable of decision. However, this approach presents the advantage of not handling the data before applying the model. The Weighted Logit methodology proposed by Rich et al. [10] proposes that OD pairs and commodity groups are representative of the population. During model estimation, the flow on each pair and for each commodity group is then used to weigh the importance of the respective pair and group. Their Logit model is applied to the crossing of the Øresund region (Denmark-Sweden) and evaluates the choice between truck, ship, train and combinations of truck with the two other modes. Jourquin and Beuthe [25] also apply a Weighted Logit to compute cost and time elasticities at a trans-European level. They especially focus on the Benelux region to evaluate the impact of geographical aggregation (NUTS-2 vs. NUTS-3) 1 on the elasticities for three modes, namely road, rail and inland waterways transport (IWT). Jourquin [27] further applies Box-Cox transformations [28] to the cost, time and distance variables within a Weighted Logit model. Indeed, these attributes are often correlated with each other in an aggregate mode choice context. The study's results show that the Box-Cox transforms can improve the validity and accuracy of the model's estimates. Albert and Schaefer [29] present a standard MNL to determine the modal split between air, truck and rail in the US. Instead of estimating the model through a likelihood maximization (as in [10], [25], [27]), it is performed via the ordinary least squares methodology. A similar procedure is used by Nuzzolo et al. [30] to simulate the modal split of Italian import and export flows between four alternatives (road, road-railway, road-sea, air).
As stated in the introduction, the choice alternatives can also be a transport chain, i.e., a sequence of multiple transport modes. This occurs if the cargo is transshipped from one mode to another along the way from origin to destination. In WORLDNET [31] -a simulation of international cargo flows -a MNL is applied to assign the OD flows to transport chains in the network. To avoid taking into account every feasible alternative, they restrict the choice to be between the k cheapest chains, with k being modifiable to allow reasonable computation time. However, there is no transport chain data available but only uni-modal OD flows. The model is then estimated iteratively by adjusting its coefficients and adding shadow prices to the network until the model's output fits the data. In the freight transport model BasGoed [32], the unavailability of transport chain data is remedied by constructing multi-modal chains from uni-modal data. This is done with heuristics based on practical assumptions.
We notice that there exist several types of aggregate mode choice models coming with different degrees of data handling. For "hybrid models", data at disaggregate level need to be generated from the available aggregate data with some chosen probability distributions. Similarly, when transport chains are used as alternatives, these chain data have to be built using some heuristics on the available data. In contrast, the Weighted Logit methodology does not require any data handling.

B. HETEROGENEITY REPRESENTATION
A key challenge of freight mode choice models is to capture the heterogeneous preferences [33]. In the context of aggregate mode choice, there are at least two aspects of heterogeneity to consider. Firstly, a shipper's behavior can significantly vary given the type of commodity that is transported [23], and its value. For example, bulk cargo does not require the same transport conditions as containerized cargo; likewise, the lead time is a more important criterion for perishable commodities than for building materials. 1. The NUTS is the official division of the EU and the UK for regional statistics [26] The second aspect concerns geography: indeed, regional particularities might impact the mode choice process due to different transport infrastructure [10], transport services or culture. Heterogeneity is also present within a region as all the established shippers will not behave identically [34]. However, this last point cannot be captured explicitly because of the aggregate nature of the data.
Under the RUM theory, two advanced variants of the MNL allow capturing heterogeneity: the Logit Mixture Model and the Latent Class Model. The former allows the coefficients of the utility functions to be randomly distributed instead of fixed [35], whereas the latter splits the population into classes with coefficients that differ from one class to another [36].
Among the works reviewed previously, the ADA methodology is the most flexible to take heterogeneity into account as it allows the use of a Mixture model to capture variations of preferences or correlation between alternatives [24]. The method also estimates various coefficients according to the commodity type being shipped. Other studies also perform segmentation with respect to the commodity type [10], [25], [27], [30]: the coefficients of the utility functions are then estimated separately for each segment. One of these works [30] goes a step further by also determining different coefficients regarding the shipping direction (import or export). Another model segments the data according to the types of OD pair, namely: hinterland to port, port to hinterland, hinterland to hinterland, and port to port [32]. A last study directly considers heterogeneous data sources by modifying the error covariance term in the model's formulation [29].
The existing studies mostly use segmentation on observable data (commodity, geography) to express some heterogeneity. Only the ADA methodology has the possibility to capture heterogeneity with respect to unobserved attributes, however this requires some disaggregate data as well.

C. CONTRIBUTION OF THIS STUDY
Within this research, we propose to estimate the mode choice model without making any assumptions on the data. Therefore, a Weighted Logit methodology is adopted [10]. We express the heterogeneous preferences in the model by introducing a Mixture formulation. The comparison of our approach with the existing models is shown in Table 1. The reader can notice the absence of data handling and the expression of preferences' heterogeneity with respect to unobserved attributes, which is allowed by the proposed modeling.
The only assumption to be made concerns the probability distribution of a given parameter among the population, then the parameters are estimated directly from the data. This method will allow revealing the underlying heterogeneity in the population. We thus propose a Weighted Logit Mixture (WLM) model that aims at staying as close as possible to the actual situation, so as to depict it accurately. Table 1 contains some other "direct" models (that do not require additional data handling), but they do not account explicitly for unobserved heterogeneity. The ones that consider some kind of heterogeneity use a deterministic segmentation most of the time according to the commodity. The Mixture methodology proposed in this research allows to go a step further by depicting heterogeneity within the segments themselves, thus extracting more information from the aggregate data.

III. ESTIMATION METHOD
The proposed WLM aims at combining the advantages of the Weighted Logit methodology [10] and the Mixture modeling [35]. The former allows estimating the mode choice model directly from aggregate OD flows, whereas the latter enables the introduction of heterogeneous preferences among the population. We first describe the Weighted Logit method, on which our approach is based and that we will use as benchmark, and then we explain how the Mixture formulation is introduced.

A. WEIGHTED LOGIT MODEL
The model's inputs are the OD matrices for each mode as well as the attributes related to each mode on each OD pair (e.g.,: cost, time, accessibility). In practice, these attributes would vary per container given its weight, due time, precise origin and destination, etc. However, due to the unavailability of shipment data, it is considered that all containers shipped on the same OD pair share the same mode attributes.
We formulate a utility function U m,q s for each mode m and each container s on OD pair q, which can be expressed according to the following formula: where V m,q s is the systematic component of the utility function and m,q s is the random component which is assumed to follow an Extreme Value distribution. The systematic part can be derived from the set I of considered attributes for each mode: This formulation contains an alternative specific constant α and a sum expressing the impact of each attribute's value X on the utility. This impact is expressed by the related coefficient β, which can be mode-specific or identical for all modes. In a classic MNL, the same α and β parameters are assumed to be shared by the whole population. Note that the right-hand side in (2) is identical for all containers due to the assumption that shipments share the same mode attributes on a given OD pair. As a result, the container index s can be dropped and the probability to choose mode m among the set of available modes M on OD pair q is computed using the following expression: where μ is a "scale parameter" generally normalized to one. The estimation of the α and β parameters is performed through a maximum likelihood estimation, in which the loglikelihood LL is defined as: with Q the full set of OD pairs, S q the full set of shipments on OD pair q and y m,q s a dummy variable equal to one if mode m is chosen for container s on OD pair q.
Since the mode choice probability is independent of s, (4) can be rewritten as: where w m,q is the total volume (in TEUs) shipped by mode m on OD pair q, which acts as a weight of the log-likelihood function.

B. WEIGHTED LOGIT MIXTURE FORMULATION
The proposed Weighted Logit Mixture method is based on the approach described above but without assuming that the β parameters are identical for the whole population. The Mixture formulation lifts this restriction by defining one or several of the β coefficients as following a random distribution ψ with mean β and variance σ 2 β . Unlike in (3), the expression of the probability has no closed-form this time: thus, the likelihood maximization cannot be performed analytically. Monte Carlo simulation shall be used to obtain a "simulated likelihood". The simulation executes R draws within a given distribution ψ to approximate P q (m) with: where r k is the result of the kth draw in ψ. The simulated log-likelihood LL to be maximized is then expressed as:

IV. CASE STUDY
We apply the proposed methodology to represent the mode choice for containerized goods along the European multimodal Rhine-Alpine (RA) corridor, focusing on the Rhine section of the corridor (see Fig. 1) where 3 transport modes are accessible: road, rail and IWT. Attributes used in freight mode choice typically consist of the cost, time, reliability, flexibility, frequency, tractability, emissions, number of transshipments, probability of damage [8], [38], [39]. In addition, the availability (or accessibility) of a mode represents an influential driver of the mode choice [39], [40]. For intermodal transport, the proximity of terminals is an important decision factor [41]: existing models use a dummy variable indicating if rail tracks and quays are accessible to a firm [42], [43] or a qualitative evaluation of the access to intermodal facilities [44]. The accessibility of road transport can also be included, for example with the highway density of a zone [45], which impacts positively the utility of road transport, or with a dummy variable indicating high traffic OD pairs [46], whose impact on road utility is negative.
In this study, we consider the accessibility a expressed as the number of terminals in both zones of origin and destination for IWT and rail and as the number of highway junctions in both zones for road. These data have been manually collected through the RA corridor info system [47]. Moreover, the weekly frequency f of IWT and rail services on the OD pair is included. These data have been collected within the NOVIMOVE project [48] and completed using the operators' websites.
The costs c of transporting and handling a container from origin to destination, expressed in thousands of euros per TEU, are issued from a conference paper [49]. In this work, the transport costs per container are estimated between NUTS-2 regions for each mode. 2 For road transport, costs are expressed as a sum of distance-and time-based costs (expressed in euros per km and euros per hour). The former include fuel, maintenance and tires; the latter mainly consist of labour, depreciation and insurance. These costs are then respectively multiplied by the distance and the time from origin to destination. The cost structure for rail transport is also composed of distance-and time-based costs, but some fixed costs are added in the computation to account for the related shunting operations. For IWT, the transport costs comprise voyage costs (i.e., fuel, port dues and infrastructure charges) and operating costs. The latter are further divided into maintenance costs and crew costs, that are proportional to the duration of the voyage. Finally, the model also includes a dummy variable p equal to one if either origin or destination zone contains a seaport. 3 It is added in the utility function of IWT: the idea is that having a port in the origin or destination will facilitate the use of waterway transport and that no road haulage will be needed.
A key note is that time could not be directly included in the model. Several estimations have been conducted with the time attribute, but the associated coefficient was consistently not significant. This is because the costs are estimated from travel times in the considered paper [49], as described above. Cost and time are then strongly correlated to each other and the model cannot be estimated with these attributes together. Nevertheless, the omission of the time attribute in our model does not mean that it does not play a role but rather does so (to some extent) through the cost attribute.
We evaluate a standard Weighted Logit to serve as a benchmark. The container volume data are issued from the ASTRA 2. Beside transport costs, other cost components are estimated in the paper such as reliability costs. However, they are not usable for our model because of their limited variability: they are either estimated using fixed values or strongly correlated to the transport costs.
3. The considered seaports are Amsterdam, Rotterdam, Antwerp and Zeebrugge. model [50]: OD matrices are available for each mode and several years. They represent the annual cargo flows between European regions at the NUTS-2 level. Based on (2), the following systematic utility functions are defined for each mode (the index for OD pair q is omitted for the ease of notation): with α IWT being normalized to zero, thus setting the reference level. In the proposed formulation, two different β coefficients for cost and accessibility are estimated: one for the road alternative and one for intermodal alternatives (rail and IWT). 4 Regarding cost, this allows considering a different cost sensitivity with respect to the mode that is considered. For accessibility, this is because this attribute is measured differently for road than for intermodal transport.

A. HETEROGENEITY REPRESENTATION
Based on the benchmark formulation, we estimate a WLM by allowing the cost coefficient β c to be randomly distributed among the population. We assume that it follows a Lognormal distribution with parameters μ c and σ 2 c . The semi-infinite support of this distribution ensures that the estimated value of the cost coefficient will have a negative sign. This a priori assumption is commonly used because a positive cost coefficient is inconsistent with the theory of rational economic behavior [51]. Indeed, it is unrealistic that a cost raise for a given mode (everything else being equal) would cause an increase in its utility. Under the defined Lognormal distribution, β c is then expressed as: with Z a standard normal variable, in this case ψ is thus  N (0, 1). The following expressions: are used to obtain the mean β c and variance σ 2 β c of the cost coefficient. We use a maximum simulated likelihood estimation with 10'000 draws in N (0, 1) to determine the values of the WLM parameters. Both the Weighted Logit (benchmark) and the WLM are estimated and validated using the software package Biogeme [52]. 4. A formulation with distinct β coefficients for each of the three modes was also investigated. However, the estimation revealed that the β coefficients for rail and IWT were not significantly different from each other. The same remark holds for frequency.

B. VALIDATION OF THE MODELS
We proceed to out-of-sample validation using flow data of two different years based on a procedure described by Jourquin [27]. We first compute the predicted modal shares on the whole corridor for both models and compare them with the actual shares. Then we assess the accuracy at the OD level: this is done by computing the correlation coefficient between the container volumes returned by our WLM (or the benchmark) and the actual ones on every OD pair for each mode. 5 Beside this, we compute the (point) cost elasticities for the benchmark and the WLM that represent how a change in the transport cost influences the probability to choose a given modality. The point elasticity E c k,q P q (m) of the probability P q (m) to choose mode m on an OD pair q with respect to the cost of mode k is expressed as: If k = m, the direct cost elasticity is obtained; otherwise, we get the cross cost elasticity. When it is computed for the WLM, P q (m) is replaced by the simulated probabilitỹ P q (m).
To obtain elasticity values for the whole corridor, we proceed to a weighted average of the computed elasticities with respect to the flow on each OD pair. The resulting estimates are then assessed by comparison with elasticity values from previous studies.

C. ADDITION OF VALUE OF TIME
Once the proposed WLM is validated, we investigate the impact of the Value of Time (VoT) on the mode choice. By Value of Time, we mean the capital costs incurred while transporting the cargo. We make use of the VoT proposed by Hintjens et al. as 1.12 euros per hour per TEU. This figure is based on the average value transported per TEU with a depreciation of four years [53]. This value is then multiplied by the total travel time for each mode, including the preand post-haulage for intermodal transport, and added to the transport costs c. We finally re-estimate the WLM with these new costs. This will allow us to evaluate the influence of time on the model's coefficients and on shippers' heterogeneity.

V. RESULTS
In this section, we present the key results of the models' estimation and compare the performance of both methods. The model is estimated with the container flow data of the year 2017 and the data of years 2016 and 2019 will be used for out-of-sample validation purposes. 6 The resulting coefficients for our WLM and the benchmark are displayed in Table 2 together with the log-likelihood. 5. The third step in the approach of Jourquin, i.e., comparing volumes on the network's segments, cannot be performed in our case since the network assignment task is not included in the present study. 6. The year 2018 is not considered since a major drought occurred on the Rhine, thus disrupting the IWT flows compared to the year 2017. Regarding the parameters, the estimated β coefficients for both models have the expected signs: negative for the costs, as an increase in the costs will impact the utility negatively; and positive for the accessibility, frequency and port coefficients. Indeed, the utility of intermodal transport increases together with the number of existing terminals in the origin and destination zones and the utility of road with the number of highway junctions. The same reasoning applies to the frequency coefficient. For the port coefficient, it means that having a seaport in either the origin or destination zone will increase the utility of IWT. Regarding the variation of the parameters between the two models, we notice that the ratio between β c,Inter and β c,Road is increased when passing from the benchmark to the WLM (1.55 for the benchmark and 1.95 for the WLM). This means that the relative cost sensitivity of intermodal transport compared to road is augmented when the cost coefficient of intermodal transport is allowed to be distributed.
Concerning the alternative specific constants, α Road is compliant to what is expected along the RA corridor: road is preferred to intermodal transport, all else being equal. The positive value of α Rail is unexpected, but this should be nuanced as, in both models, it is not very significant, i.e., different from zero. This last point suggests that our models have a satisfying predictive power. Indeed, the alternative specific constant represents the mean effect on the utility of other attributes that are not included in the utility function. When the value of α gets closer to zero, it means that the influence of these other attributes is decreased, or formulated differently, that the deterministic part of the utility function has an improved descriptive power. For the other coefficients, only β a,Road exhibits a p-value higher than the 5% threshold, but it falls under the 10% limit in both models.

A. HETEROGENEITY REPRESENTATION
Concerning the variability of the cost coefficients, we had also estimated Mixture specifications where both β c,Road and β c,Inter , or only β c,Road were log-normally distributed, but results were unreliable since several parameters were not statistically significant. In the proposed WLM, however, the σ c,Inter estimates is statistically significant which means that there exists a variation of the cost sensitivity regarding intermodal transport among the population. The magnitude of the standard deviation estimates reveals that the preferences concerning the intermodal transport costs vary substantially. The probability distribution of β c,Inter is depicted in Fig. 2.
We immediately notice that the mode of the distribution of β c,Inter (which equals −6.6) is close to the fixed coefficient estimated in the benchmark. But a great share of the population exhibits a lower cost coefficient: the mean of the distribution is indeed almost −20. This means that the benchmark underestimates the influence of intermodal transport cost for a significant part of the population. The WLM enables to explicitly capture this part of the population with a low cost coefficient, or equivalently, a higher sensitivity. This explains why, as noticed above, the relative cost sensitivity of intermodal transport compared to road is increased in the WLM.

B. VALIDATION OF THE MODELS
The proposed WLM is further compared to the benchmark with an out-of-sample validation, which is performed at the corridor and OD pair levels. Moreover, we compute the cost elasticities from our models and compare them to existing works.

1) CORRIDOR LEVEL
We estimate the market shares of each mode for years 2016 and 2019 with both models. The predicted modal shares are then compared with the ones measured from the existing data in Table 3.
The benchmark shares are generally closer to the actual ones, except for the share of road for year 2016. However, the relative differences remain modest for both models. The greatest absolute difference between actual and estimated shares for both the benchmark and the WLM happens for the share of IWT in year 2019. This difference is −0.25% and −0.44% respectively, which represents a relative error around 1%. We nevertheless notice that the WLM tends to overestimate the share of rail with a relative error of approximately 7%. Other than this, the relative differences remain small for both models: it is then necessary to further compare them at a more disaggregate level.

2) OD PAIR LEVEL
We now compare the actual container flows to the ones estimated by both models on every OD pair and for each mode. To do so, the correlation coefficients between actual and estimated volumes for years 2016 and 2019 are computed. To further evaluate the models' performance, we also compute the correlation factors obtained when OD pairs from Rotterdam to Antwerp and vice versa are not included. The resulting correlation coefficients are presented in Table 4.
The results show that the models are both very successful to estimate the container volumes transported by IWT and road, but much less when it comes to rail transport. Several reasons might explain this limited performance: firstly, the cost estimation for rail is less detailed than for the other modes [49]. Secondly, rail transport is less available (or, at least, less data are reported) along the RA corridor. This means that the estimation is performed on less data points than for road and IWT. Finally, even when rail transport data are available, the container volumes are significantly lower than for the two other modes. In a Weighted Logit context, low volumes imply less weight in the estimation process: thus, the resulting estimators may be less accurate.
The influence of the Weighted Logit methodology on the predictive power is particularly visible when the OD pairs from/to Rotterdam to/from Antwerp are not considered in the correlation coefficient computation. In that case, both models perform better regarding road and rail transport but much worse for IWT. Indeed, as these two OD pairs are the only ones linking two seaports, they have at least two characteristics that distinguish them from others: 1) The number of transported containers is considerably higher (see Fig. 3 hereafter). The yearly volumes reported in the dataset are around 1.5 million TEUs for Rotterdam → Antwerp and around 700'000 TEUs in the other direction. As a comparison, the third busiest OD pair has a yearly volume of around 350'000 TEUs.
2) The modal split is remarkably different. Table 5, which displays the modal shares corresponding to the particular cases in Table 4, show this difference in modal split between the Rotterdam ↔ Antwerp pairs and the remaining ones. The consequence of these considerations is that it leads to large relative errors when the proposed models are used to estimate the container flows on these two OD pairs. Fig. 3 illustrates the difference in scales between the Rotterdam ↔ Antwerp pairs and all the other ones for IWT. Together with Table 5, it also shows that, for the Rotterdam ↔ Antwerp pairs, the number of containers are underestimated for IWT, and overestimated for road and rail. All of this reveals that a different model (or, at least, different coefficients) should be used to estimate the "seaport-to-seaport flows". It also legitimates the approach proposed by De Bok et al., which consists in segmenting data according to the type of OD pair [32].
Finally, this analysis offers more insights on the comparison of the two models than the corridor level analysis. Indeed, in Table 4, the correlation coefficients of the WLM are almost always greater than the ones of the benchmark, suggesting that the WLM returns better estimations than the benchmark. This is supported by Table 5 where the shares estimated by our WLM are systematically closer to the actual ones compared to the benchmark. These results at the OD pair level highlight the benefits of our Mixture approach compared to the standard Weighted Logit method.
One question still remains: if the shares of our WLM are more accurate than the ones of the benchmark when looking at the Rotterdam ↔ Antwerp pairs and the remaining ones separately, then why is it not the case at the aggregate level? This is due to the compensation of the differences observed in Table 5: for almost all modes, the share differences have an opposite sign for the Rotterdam ↔ Antwerp pairs compared to the other pairs. Also, the former represents a container volume of 9%, whereas the latter account for 91% of the considered corridor. If we take IWT for year 2016 as an example, the differences are compensated as follows: • For the benchmark: −6.78% * 9% + 0.79% * 91% = 0.11% • For the WLM: −5.68% * 9% + 0.42% * 91% = −0.13% which leads to the same differences, except for a rounding, as reported in Table 3. These results would suggest that the benchmark is more accurate than the WLM, although the WLM shares are closer to the actual ones for both the Rotterdam ↔ Antwerp pairs and the other ones.
These considerations show that, even if a model seems to perform better at the aggregate level, it does not mean its predictions at the OD pair level will be more accurate. And  it is the latter that really matters for a mode choice model. Hence, conclusions cannot be drawn from a comparison at the aggregate level. A validation at the OD pair level is required as it is more informative on the predictive performances of the models. In our case, the WLM has then proven to give more accurate share predictions than the benchmark. Table 6 contains the resulting cost elasticities of the benchmark and the WLM. We notice great variations between the models, especially regarding the direct elasticities that are displayed in bold. Indeed, the WLM exhibits much higher direct elasticity values (in absolute value). This is because the WLM has higher cost coefficients (in absolute value) than the benchmark, as depicted in Table 2.

3) COST ELASTICITY
For both models, the direct elasticity of road is lower than for intermodal transport: meaning that the impact of a cost increase on the resulting mode share will be less important for road. Significant variations between both models also occur regarding the cross elasticities of intermodal transport probability with respect to costs of the road alternative. Once again, elasticities are significantly higher for the WLM than for the benchmark.
To put these elasticity values into perspective, they are compared to the cost elasticities estimated in recent studies, see Table 7. The work of Arencibia et al. makes use of stated preference data collected from Spanish shippers [54], whereas the model of Jensen et al. is estimated using commodity flow surveys [55]. The last two studies estimate the elasticities with a Weighted Logit (the methodology used for our benchmark), as mentioned in the literature review.
Compared to the values from other studies, the elasticities computed in this paper seem coherent. They all fall within the range of values proposed by Jourquin & Beuthe. The provided range is particularly large compared to the other studies, but this might also be due to the fact that they also use a Weighted Logit methodology and that their geographical coverage is close to the one used in our study.

C. ADDITION OF VALUE OF TIME
The resulting coefficients of the WLM with the inclusion of VoT are reported in Table 8, together with the coefficients of the previously estimated WLM.
As expected, the addition of VoT does not have an important impact on the value and significance of the β coefficients that are not related to costs. However, it is has a noticeable impact on the values of the cost parameters and the alternative specific constants α. The values of α Rail and α Road (but to a lesser extent) are reduced. This means that adding this new element has improved the predictive power of the deterministic part of the utility functions of these modes.
For the cost coefficients, the absolute values of both β c,Inter and β c,Road decrease: this is because the new cost figures have been increased by the addition of VoT. As IWT and rail have higher travel times, this decrease is more important for the intermodal coefficient than for the road. As a result, the two coefficients are closer to each other: the ratio between β c,Inter and β c,Road was almost 2, when it is less than 1.5 with VoT included. It means that the relative cost sensitivity of intermodal transport compared to road is decreased when considering the VoT.
Indeed, a major asset of intermodal transport is the lower costs compared to road: when VoT is not considered, shippers may then be much more sensitive to a cost increase for IWT or rail, than for road. However, the lower costs are achieved at the expense of a larger transportation time so that, when VoT is added to the out-of-pocket costs, it acts as a counterbalance. The resulting cost sensitivity with respect to intermodal transport is thus less important, but still significantly more than for road transport.
Regarding the heterogeneity of cost sensitivity, the σ c,Inter estimates remains statistically significant. Fig. 4 shows the probability distribution of β c,Inter when VoT is included compared to when it is not. The addition of VoT causes a shrinkage of the distribution and a shortening of its tail. Indeed, the value of σ β c,Inter is decreased by 52% in Table 8. And this is not only due to the change in scale of β c,Inter since its mean β c,Inter decreases by only 36% in absolute   value. It shows that adding VoT in the model enables to explain the heterogeneity to some extend, yet there remains heterogeneity due to attributes exogenous to the model.

D. DISCUSSION
The results demonstrate that the proposed WLM is capable of a better estimation of the characteristics of the shippers' population while achieving a performance at least equivalent to the benchmark. In particular, the WLM reveals two important elements that cannot be captured by the benchmark: there exists a variation of cost sensitivity among the population and this variation is occurring for intermodal transport.
The significant standard deviation of the intermodal cost coefficient implies a variation of the shippers' cost sensitivity. Indeed, the cost coefficient ranges from the extremely cost-sensitive shippers (with very low values of cost coefficient) and shippers that are sensitive to cost but are likely to proceed to a trade-off with some other attributes. The former category of shippers would be ignored by the benchmark since the estimated cost coefficient is relatively close to zero. This issue means that, when the model is used to simulate the demand for freight transport, an entire segment of the population is not represented. There is then a substantial risk to draw inaccurate conclusions and take inappropriate actions.
The fact that data does not reveal sensitivity variation concerning the cost of road could be explained by the higher cost of road transport compared to the intermodal alternatives. Shippers might be much less cost-sensitive regarding transport by truck since it is already an expensive alternative in itself. Road transport may then attract them with other attributes, such as lower transport time or increased availability.
The results also show that the addition of VoT into the WLM reduces the standard deviation of the distribution of the intermodal cost coefficient. This distribution accounts for all the different factors playing a role in shippers' cost sensitivity, but that cannot be explicitly captured in the model. By including more contextual variables into the model (or if better data are collected), then the distribution will become less and less important and the coefficient's estimation will be improved. It thus leads to a model fitting better the real behavior of shippers as it captures more aspects of the mode choice decision.

VI. CONCLUSION AND FUTURE RESEARCH
This paper proposes a Weighted Logit Mixture (WLM) model that estimates the variability of cost preferences among the shippers' population using only aggregate flow data, cost estimates and publicly available data. The obtained results show that the WLM is better capable to estimate the population's preferences while exhibiting improved performance compared to the benchmark. The results also demonstrate that there exists a significant variation in the sensitivity regarding intermodal transport costs.
The Weighted Mixture modeling not only gives more information about the mode choice preferences; it also represents the shippers' population more realistically. Indeed, assuming that all shippers share the same behavior would mean that, for a given mode, they would all contract the same carrier, e.g., the cheapest one. If this might be true for some shippers, others also opt for more expensive services, because of contractual relationship or tracking services for example.
That is why it is crucial to analyze behavior in detail by looking into different segments, including as much contextual variables as possible, and considering heterogeneity.
With the proposed Weighted Logit Mixture, we provide a way to do it with aggregate data. By considering preferences variation, this approach supports better the implementation of a specific innovation or policy by providing more precise indications concerning the diverse behaviors inherent to a large freight transport network. Similarly, the impacts of the innovation or policy can be analyzed more realistically by taking into account the heterogeneous preferences in the modal share estimations.
Nevertheless, some challenges are to be addressed to develop the full potential of this approach. Firstly, some important attributes, in particular: time and reliability, are not (directly) included in the specification of the utility functions. It would be beneficial to collect data and/or come up with new metrics quantifying these attributes in order to obtain a more thorough description of the underlying behavior and achieve a better predictive power. Secondly, for the attributes included in the models, the estimates need to be as accurate as possible. In our case study, the cost for rail transport deserves a more detailed computation to obtain better predictions for volumes at the OD pair level. Thirdly, it might be difficult to obtain reliable flow data at an international scale. Data for different countries are usually collected by different statistical offices having their own methodologies. A combination of these flow data could then be arduous to realize. For this reason, we chose to make use of flow data issued from a European freight model. Finally, a proper re-estimation procedure should be developed to facilitate the update of the estimates when new data (such as shipment data or more accurate cost estimates) become available. Note that all these remarks also apply to the benchmark method.
Regarding the WLM itself, a major comment is that the amplitude of the variation of cost sensitivity may not be as high as suggested by the estimated parameters of the distribution's mean and standard deviation. The long tail of the Lognormal distribution can indeed cause an overestimation (in absolute value) of these parameters [51]. Therefore, further experiments should be conducted with different assumptions on the probability distribution to capture the sensitivity variation in greater details.
To conclude, the WLM can estimate the variation of preferences in the whole population but it does not give any indication about what causes this variation. It would be beneficial to translate (at least partially) the probability distribution into tangible characteristics by adding more cost elements, or through deterministic segmentation. When studying container transport, it is difficult to perform this based on commodity type. However, segmentation could be conducted based on some geographical features such as the shipping distance or the different countries. A latent class formulation could also be used to reveal the various behavioral patterns leading to the observed probability distribution. Then, a similar Weighted Logit Mixture methodology can be applied to the resulting segments or classes to reveal the remaining heterogeneity of preferences.