A Mobility Model for Synthetic Travel Demand From Sparse Traces

Knowing how much people travel is essential for transport planning. Empirical mobility traces collected from call detail records (CDRs), location-based social networks (LBSNs), and social media data have been used widely to study mobility patterns. However, these data suffer from sparsity, an issue that has largely been overlooked. In order to extend the use of these low-cost and accessible data, this study proposes a mobility model that fills the gaps in sparse mobility traces from which one can later synthesise travel demand. The proposed model extends the fundamental mechanisms of exploration and preferential return to synthesise mobility trips. The model is tested on sparse mobility traces from Twitter. We validate our model and find good agreement on origin-destination matrices and trip distance distributions for Sweden, the Netherlands, and São Paulo, Brazil, compared with a benchmark model using a heuristic method, especially for the most frequent trip distance range (1–40 km). Moreover, the learned model parameters are found to be transferable from one region to another. Using the proposed model, reasonable travel demand values can be synthesised from a dataset covering a large enough population of very sparse individual geolocations (around 1.5 geolocations per day covering 100 days on average).


I. INTRODUCTION
T RANSPORTATION accounts for 24% of global CO 2 emissions annually [1], presenting a major challenge to climate change mitigation. Meeting the challenge will require knowing the details of travel demand: how and how much people travel. Quantifying travel demand often relies on an origin-destination (OD) matrix [2], representing the intensity of flows of people between different zones/regions. Another extensively explored aspect is the trip distance distribution, which characterises how far people travel.
In transport planning and policymaking, different models are used to estimate travel demand either directly at the population level or through the detailed activity chains of agents. They rely on high-quality data collected through traditional The review of this article was arranged by Associate Editor Meng Li. methods including road traffic counting, household travel surveys, censuses, and population mobility models. These data collection methods are often costly, have small sample sizes, and are updated infrequently [3].
The increased prevalence of location-aware devices over the last decade has benefited our understanding of human mobility [4], [5]. Common sources include: call detail records (CDRs); GPS-enabled devices; tracking apps on smartphones; location-based social networks (LBSNs), e.g., Foursquare; and social media data, e.g., Twitter. The mobility traces obtained from these sources are promising in quantifying the flows of people between places and how far they travel [6].
Given that geolocations are collected with triggered phone activities or volunteered reports, one salient issue is to what extent the covered traces are incomplete, i.e., the sparsity issue. Data sources like CDRs, LBSNs, and social media data only provide a partial view of the actual mobility trajectories [7]. The incompleteness of the traces limits the accuracy of the estimated travel demand. Nevertheless, these sources are collectively abundant, especially LBSNs and social media data, which are relatively less expensive and available globally. In order to extend the use of these data sources, it is important to have appropriate techniques to fill the gaps in sparse mobility traces.
This study proposes a mobility model that fills the gaps in sparse mobility traces, tested on geolocations collected from social media data. Using the model-processed data, one can subsequently synthesise travel demand on two aspects: the share of trips between spatial zones and the trip distance distribution. The proposed model extends the fundamental mechanisms of exploration and preferential return to synthesise mobility trips [8] for accommodating the individually-sparse but collectively abundant mobility traces. We first calibrate and validate the model with official data on daily travel demand. We then apply the model to represent the travel demand in two countries and one metropolitan region. The model generates good transferability of its parameters from one region to another.
The remainder of this paper is organised as follows. The rest of this section reviews the work related to different data sources used to estimate the two aspects of travel demand: trip distance distribution and flows between spatial zones. It covers the shortcomings of these data sources, specifically related to sparsity, followed by a brief summary of the objectives of the present study. Section II describes the model design, and Section III describes the model experiment. The results are presented in Sections IV, and Section V discusses the findings and identifies future research needs and the conclusions.

A. RELATED WORK
Common models for travel demand estimation include the four-step model [9], activity-based models [10], and agentbased models (ABM) with a synthetic population [11]. These models rely on data collected from traditional travel surveys and censuses. For instance, a study uses the data from an yearly census and a national household travel survey to create a synthetic population and its travel demand [11]. These data sources have careful sampling designed to statistically represent the true population. However, they also have many shortcomings such as being costly to collect and having low sampling rates, short survey duration, under-reporting of trips, and being out-of-date [12]. Travel surveys also fail to capture most of the infrequent long-distance trips [13].
Travel demand estimation has benefited from increasingly available location-aware devices [12] that provide a variety of human mobility records. Using data from GPS-enabled devices, a multi-scale model has been proposed to synthesise mobility traces that yield representative trip distance distribution [5]. Another study has updated origin-destination matrices using aggregated GPS data [14]. The movements of a large population can be captured by CDRs [15], [16], [17]. CDRs have been used to develop a microscopic individual mobility model [8] and reveal fundamental mobility laws such as the distance-frequency scaling law [18]. Wang et al. (2018) have explored social disparities of travel distances using 650 million geotagged tweets [19]. Liao et al. (2021) have modelled the overall travel demand using geolocations of Twitter data, showing good agreements with the ground truth data [20].
However, data collected from CDRs, LBSNs, and social media are collectively abundant but individually sparse. For example, in Twitter data, the top geotag users generate 1-3 geolocations per active day on average as revealed by the present study. In other words, these data sources capture incomplete mobility trips because they do not record all the locations a user has visited. Due to this sparsity issue, estimating travel demand using CDRs is not very feasible [21]. Similarly, sparse traces from social media data yield sparse origin-destination matrices (ODMs) [22].
In order to address the sparsity issue, studies have developed different techniques to fill the gaps in sparse individual mobility traces. Typical techniques include heuristic methods and mathematical models. Heuristic methods that are widely used in processing sparse traces consist of intuitive rules. For example, a CDR entry can be regarded as a stay that lasts for a certain time period, e.g., one hour [23]; the missing entries between 10 pm and 7 am, when a user is assumed to be at home, are filled with the home location estimated based on the user's historical records [21]. When using sparse traces, the reported geolocations need to be processed to become trips. A widely applied practice is to connect the two consecutive geolocations and filter out connections with a time interval longer than a selected time threshold, e.g., 4 hours [22], [24]. However, these heuristic methods using time-based rules are arbitrary. Moreover, such filtering leads to a massive reduction of available data which does not reflect true mobility patterns.
Beyond the heuristic methods, a variety of mathematical models have been designed to bridge the gaps in the sparse mobility traces to increase their usability in understanding mobility patterns. Chen et al. (2019) have developed a technique called Context-enhanced Trajectory Reconstruction that completes individual CDR-based traces using tensor factorisation [7]. The synthesised data deliver a trip distance distribution with a better fit among other key mobility indicators. Their study suggests that filling the gaps in the sparse individual traces results in better representation of travel demand, e.g., the truncated power-law distribution of trip distance distributions. Burkhard et al. (2017) have reconstructed regular mobility patterns from users with sparse CDRs using idiosyncratic daily patterns from clustered daily activities [25].
With the exception of these few studies, most studies design methods that directly extract patterns from sparse mobility traces [19], [20], [26]. The generally overlooked bias from data sparsity can affect the observed mobility patterns [7] and limit their usability for travel demand estimation.

B. STUDY OBJECTIVES
Sparse mobility traces collected from CDRs, LBSNs, and social media data have been widely used to study mobility patterns. However, most studies use them directly and ignore the impact of the sparsity issue, or apply simple heuristic methods, both of which lead to results that are potentially biased and inaccurate. In order to extend the use of these data, it is crucial to design appropriate techniques to fill the gaps in sparse mobility traces.
To bridge the gaps in the literature, we propose a mobility model to deal with sparse mobility traces, tested on geolocations of social media data. We calibrate and validate the model with the other established data sources in the form of origin-destination matrices quantifying the daily travel demand in Sweden, the Netherlands, and São Paulo, Brazil. Specifically, we attempt to answer the following research questions: • Can we develop a model that fills the gaps in sparse mobility data for a more accurate travel demand estimation? • How well does the model perform compared with heuristic methods?

II. MODEL DESIGN
This section proposes a model that fills the gaps in sparse mobility traces. The model-synthesised data are used to obtain individual trips for synthetic travel demand. We start with a problem statement (Section II-A) defining the sparse input and the synthesised output. In Section II-B, we describe the features extracted from the sparse traces for modelling. Then in Section II-C, we describe how the model components work together to synthesise mobility data.

A. PROBLEM STATEMENT
Part of the mobility traces of a given individual are observed via CDRs or social media platforms over a certain duration, expressed as: Y are the decimal degree of latitude and longitude respectively, and p is the chronological order index of the observed visits to a variety of locations ranging from 1 to the total number of visits by the individual, i.e., N. Locations are distinguished by their recorded coordinates (X, Y), however, their spatial resolution varies depending on logging noise, cell tower coverage of CDRs, or different social media platforms. Therefore, in practice, some preprocessing is needed to cluster the raw location coordinates so that their spatial resolution is more consistent. After preprocessing the raw data, we refer to a location as a unique pair of GPS coordinates. However, the sparse traces Trac are incomplete, they do not include all the locations visited by an individual, and are biased by the associated activity, be it tweeting or making a phone call, depending on how frequently, at what time and where the specific activity is typically performed. In order to fill the gaps, the proposed model takes Trac as input and synthesises them into a more representative set of mobility data, Trac , for travel demand estimation.
As model output, the synthesised mobility traces Trac = {(X, Y) day,m | 1 ≤ day ≤ D, 1 ≤ m ≤ M day } represent visits of an individual that happen in a series of simulation days (day) where m is the chronological order index of a visit to a location (X, Y) in a simulation day. A simulation day is a working unit of how the model generates synthesised data, as specified in Section II-C.3. The total number of simulation days (D) is determined when the aggregate output of the model-synthesised data stabilises (see Appendix C). The number of visits per simulation day, M day , is empirically determined by looking at how many displacements are usually made by the population or a specific individual. In the experiment of this study, we use the Swedish National Travel Survey (2011-2016) [27] to get the distribution of the number of visits per day across all survey participants. For each simulation day of each individual, M day is randomly drawn from that distribution (detailed in Appendix C).

B. FEATURE EXTRACTION
For a given individual, a number of features can be extracted from the model input Trac. These features are later used for synthesising mobility data.
The set S is defined as a collection of all the distinct locations having different values of (X, Y). The number of distinct locations in S is indicated by n. The frequency rate of them being visited is expressed as f j , j = 1, 2, . . . , n. Among these locations, the home location s h is identified as the most-visited location between 7 pm and 8 am on weekdays and the whole day on weekends [19], [20], [28].
The jump size θ p connecting two consecutive observed locations, s p and s p+1 , is defined as the Haversine distance between them. The bearing α p , referring to the direction from s p to s p+1 , is an angle measured clockwise from the north direction. The set of the jump size and the bearing of all the pairs of consecutive locations in Trac is expressed

C. SYNTHESISING MOBILITY DATA
Given Trac, the model sets the individual at home (s h ) to start the simulation day. As shown in Figure 1, the model generates the next location given current location s p with two options: 1) to return to a previously visited location s j ∈ S, j = p with a probability of Prob(return) or 2) to explore a new location with a probability of Prob(explore) where we have Prob(explore) + Prob(return) = 1. According to the individual mobility model [8], the probability of exploring a new location is expressed as: where the greater the n, the smaller the probability of exploring a new location and ρ and γ control how much n affects the probability. Given the same n and γ , the greater the ρ, the higher the probability of exploring. Given the same ρ, the greater the γ , the more rapidly declining Prob(explore) as n increases.

1) RETURN TO AN OLD PLACE
If a return is generated, the model moves this individual to a previously visited location in S. This location is selected from all the candidate locations in S that have unequal probabilities. The probability of a candidate place s p+1 considering the current location s p is determined by two factors, visitation frequency P(s p+1 |s p ) and impedance to the candidate places I(s p+1 |s p ). Visitation frequency: The sparse traces are often collected passively, i.e., being opportunistic due to their association with certain activities. They have biased visitation frequency of observed places. For social media data, habitual places such as home and work are much less reported relative to uncommon places [29]. For CDRs, the sparse traces are biased toward locations of phone activities [30]. However, one may expect the rank order of places, based on their visitation frequency from sparse traces, to be preserved, if not the absolute frequency [21]. Therefore, we define visitation frequency P(s p+1 |s p ) as: where k s p+1 represents the rank order of location s p+1 , which is the kth most visited location whose visitation frequency follows Zipf's law k −ζ s p+1 where ζ ≈ 1.2 ± 0.1 [8]. Impedance to the candidate places: The other factor affecting the selection of returning to an old place is the distance (travel impedance) from the current location to the candidate place. Naturally, people are more likely to visit nearby locations over distant ones [18]. Besides, the incorporation of the travel impedance factor helps to further correct the biases of rank order of locations in the sparse data, to avoid their frequency dominating the visitation probability of different candidate places. We define this impedance term I(s p+1 | s p ) as: is the distance between a candidate place s p+1 and the current location s p . To keep the model generic and boundary-free, we use Harvesine distance. And the parameter β controls the degree to which a given individual is constrained by distance. The higher the β, the more likely the individual is to visit places nearby. Combining 2 and 3, the selection of a return location is associated with the distances from the current location to the candidate places as well as the historical visitation frequency indicating the importance levels of these candidate places:

2) EXPLORE A NEW PLACE
If exploring a new location, the model moves individual i to an unobserved location s p+1 (s p+1 / ∈ S). The new location is determined by the current location s p and the jump size θ and bearing α randomly selected from J as illustrated in Figure 1-Option 2: where the function shift computes the coordinates of the new location by moving the jump size of θ along the clockwise direction of the bearing angle α (the north as zero degrees). Every time a new place is selected, the total number of distinct places visited n is updated, n ← n + 1.

3) GENERATE SIMULATION DAYS
For a simulation day with M day visits, the individual departs from s h to visit a series of locations, where the last one is For M day −1 visits, each location is created by either returning to an old place (Section II-C.1) or exploring a new place (Section II-C.2). As illustrated in Algorithm 1, after the specified simulation days (D) are finished, the mobility data of the individual (Trac ) are synthesised by using the sparse input Trac.

III. MODEL EXPERIMENT
Considering the ground-truth data availability and the potential impact of geographical scales on the model performance, we select Sweden, the Netherlands, and São Paulo to do the model experiment. These three regions have similar population sizes but distinct areas; São Paulo is a metropolitan area whereas Sweden and the Netherlands are two countries of different sizes (detailed in Table 1). Specifically, we use a benchmark model (Section III-A) with geotagged tweets as an example of sparse mobility traces (detailed in Appendix B).
As illustrated in Figure 2, we first construct models for the three study areas and calibrate the models against the official travel survey data as the ground truth to find the optimal parameters. The aim of the experiment (Section III-B) is to see how the model performs in representing the travel demand, as quantified by the aggregated population flows between spatial zones, when validated against official data sources. The model performance is evaluated by comparing the ODM and its trip distance distribution with the ground truth in contrast with the benchmark model.

A. BENCHMARK MODEL
In assessing the performance of the model's synthetic travel demand estimation, we create a benchmark model using a common heuristic method of generating an origin-destination matrix (ODM) based on sparse mobility traces. The benchmark model converts the displacements of two consecutive geolocations generated by the same individual with a time interval below 24 hours into trips [22], [24], [31]. The origindestination pairs of these converted trips go through spatial aggregation for all the covered individuals to formulate the benchmark ODM to be compared with the ground truth together with the proposed model. The performance gain between the proposed model and the benchmark model quantifies to what extent the proposed model corrects the biases in sparse traces, thus contributing to an improved travel demand estimation at the aggregate level.

B. MODEL EXPERIMENT
The preprocessed sparse geolocations, as described in Appendix B, are ordered chronologically and divided into two equal-length parts, one part for calibration and the other for validation. With an initial parameter setting, the model takes in sparse traces for each individual (Trac) to generate visits (Trac ). All the individuals' visits are further aggregated on the spatial zones consistent with the ground-truth data to calculate the ODM. The calculated ODM is compared with the ground truth in terms of the trip distance distribution using the Kullback-Leibler (KL) divergence measure [20], [32], [33]. A small KL divergence value indicates that the two distributions are similar. The optimal model parameters are those that yield the smallest KL divergence with Bayesian optimisation. The model with optimal parameters is applied to the validation dataset, and the performance (KL divergence from the ground truth) is compared to that for the calibration dataset.

1) MODEL SETTINGS
The initial model setup is illustrated in Algorithm 1 (Data). Except for the input of sparse traces (Trac), the model has a few parameters that need to be set in order for it to synthesise mobility data. The meanings and values of these parameters are displayed in Table 1. Prob(F) is the probability of a set of values of No. of visits to locations per day, F, which is empirically derived from the Swedish National Travel Survey (2011-2016) [27]. See the detailed distribution in Figure 1. D is determined based on the exploration of the relationship between the model's performance, KL divergence, with a varying value of the number of simulation days (detailed in Figure 1). For three of these parameters, ρ, γ , and β, Bayesian optimisation on model outputs against the groundtruth data is used to specify the values within the intervals in Table 1. This is introduced in the rest of this section.

2) GROUND-TRUTH DATA
We use the travel survey data covering detailed trip information, such as the origin, destination and distance for individual trips, from three selected regions as shown in Figure 3. Given that some validation data only report weekday travel, for the sake of consistency, we focus on weekday trips.
Sweden: The Swedish National Travel Survey collects one-day travel diaries for 2011 to 2016 [27]. The survey includes 171,553 trips from 38,258 participants with 2,189 record days [20]. This dataset contains the origins and destinations of trips as well as trip distance. The spatial resolution is the DeSO zone defined as 5,984 demographic statistics areas by Statistics Sweden.
The Netherlands: The dataset of daily mobility OViN (Onderzoek Verplaatsingen in Nederland) [34] is a survey conducted in 2017 with 37,016 respondents at the national level. All trips originate and end in postal code areas, grouped by their first four digits. In total, there are 4,066 zones.
São Paulo, Brazil: The OD survey [35] carried out in 2017 interviewed 32,000 households (100,000 people) for their recorded weekday. There are 517 spatial zones, of which 342 zones correspond to the municipality of São Paulo, and the rest cover the neighbouring municipalities. This dataset does not have detailed trip distances. The trip distances of the OD pairs are calculated based on the Haversine distance between the centroids of the corresponding origin and destination zones.

3) BAYESIAN OPTIMISATION
In the optimisation process, we aim to find the optimal values of the undetermined parameters listed in Table 1 so that the calibrated model approximates the ground truth as closely as possible. Bayesian optimisation is a global optimisation that does not specify any forms of functions; it finds the optimal parameters given the objective function by taking advantage of the full information provided by the history of the optimisation [36].
In this study, the objective function KL divergence is defined below: where d group is a set of quantile-based distance groups (100 quantiles) based on the spatial zones of the study area and P(d) is the frequency rate of trips that fall in a given distance group d ∈ d group based on the ground-truth data: while Q(d) is the frequency rate of trips in a given distance group d ∈ d group based on the model output i.e., its synthesised mobility data from all the individuals i = 1, 2, . . . , I: where ρ, γ , and β are the target parameters whose optimal values are selected to maximise −D KL (minimise D KL ).
We use a constrained global optimisation package in Python that is built upon Bayesian inference and Gaussian process [37]. The technique is chosen over other alternatives, e.g., a grid search, due to the high computation cost of calculating the objective function starting with sparse traces. Moreover, this technique allows a balance between exploration and exploitation in searching for the optimal parameters [37].

IV. RESULTS
In this section, we first present the model calibration and validation results with the optimal parameters (Section IV-A) and then test the model's performance in representing travel demand (Section IV-B), and the impact of trip distance and length of sparse traces on the model's performance (Section IV-C). In the last of this section, we discuss model parameter transferability (Section IV-D).

A. CALIBRATED MODELS FOR SWEDEN, THE NETHERLANDS, AND SÃO PAULO, BRAZIL
In model calibration, the Bayesian optimisation searches over the parameters' value space to find the optimal set of ρ, γ , and β for the three case study regions. The results are presented in Figure 4. In the search through the parameter space, the KL divergence varies similarly for the three geographical regions. Table 2 summarises the optimal model parameters and corresponding model performance in terms of KL divergence for the calibration and validation datasets. The performance difference between the calibration and validation datasets is small for the Netherlands and São Paulo; it is slightly greater for Sweden. Compared with the benchmark, the proposed model approximates the ground truth better: KL divergence decreases from the benchmark to the proposed model 67% -96% for the calibration data and 35% -98% for the validation data. Figure 5 shows an example of generated individual ODMs using the benchmark model vs. the proposed model based on sparse geolocations of an individual covering 315 days. In Figure 5(a), the sparse geolocations are directly used by  the benchmark to produce the individual ODM, resulting in 64 spatial zones between which the trips are created. The proposed model, on the other hand, fills the gaps in the sparse data resulting in more diverse synthetic trips covering 123 spatial zones ( Figure 5(b)). For both ODMs from the benchmark and the proposed model, the blue arc represents the home location. For daily travel, many trips are originated from or attracted to the home. We see the proposed model better reflects such a pattern compared with the benchmark.

B. POPULATION FLOWS: ODMS AND DISTANCE DISTRIBUTION
We quantify the population flows between the spatial zones in the study areas by aggregating the results of all the individuals from the proposed and benchmark models, and compare with the trips in the ground-truth data. Compared with the ground-truth data, four trip frequency rate values are calculated from the proposed model vs. the benchmark model, using the calibration data vs. the validation data. These four model-based frequency rates are each compared with the one from the ground-truth data.
As illustrated in Figure 6, if the model performs the same as the ground truth, all points will fall on the diagonal line. We see that the model generally performs better for OD pairs of higher frequency rate than for those of lower frequency rate. We also observe that the performance varies between the three regions. Compared with the benchmark model results, the proposed model generates more representative trips that generally approximate the ground truth better.
Besides the visualisation in Figure 6, we use two indicators, Kendall's tau and the Sørensen-Dice similarity index (SSI) [38], to further compare the performance of the proposed and benchmark models. Kendall's tau quantifies the correlation of the trip frequency rate of all the spatial zones between the ground truth and the model vs. the benchmark outputs. The SSI takes values between 0, when there is no similarity, and 1, when the model output and the groundtruth data are identical. Taking the average of validation and calibration, their results are shown in Table 3.
The similarity scores (KL divergence) for the trip distance distribution of the ground truth and model outputs against the benchmark are included in the CDF plots of Figure 7. The proposed model approximates the ground truth better than the benchmark model, i.e., the blue curves are closer to the orange curve than the green curves. Moreover, the benchmark model tends to underestimate the trip distance. For example, trips below 10 km account for 75 -90% of total trips in all three regions according to the benchmark models. However, the ground-truth data and the model outputs suggest the shares of 75%, 80%, and 55% approximately, which largely depend on the regions. The overall similarity results are consistent with the results of ODMs. In all three regions, the model applied to the calibration dataset approximates the ground-truth data slightly better than the one applied to the validation dataset.
For ODMs and distance distributions, the proposed model generally performs better than the benchmark model. There is one exception for Sweden: the similarity of ODMs between the model output and the ground-truth data is the same or worse than the benchmark. But its KL divergence indicates better performance than the benchmark. In summary, there is a consistent regional difference for both ODMs and distance distribution: the proposed model performs the best in São Paulo, followed by the Netherlands, and Sweden.

C. IMPACT OF TRIP DISTANCE AND LENGTH OF SPARSE TRACES
How the model approximates the ground truth of trip frequency rate depends on trip distance and region ( Figure 8). The model output is very close to the ground truth data for the most frequent trip distance range (1-10 km). For the rest of the trip distance ranges, the model slightly underestimates the trip frequency for distances between 10-30 km and overestimates above 30 km up to 100-300 km in the two countries (Figure 8a-d). When the trip distance increases above 100-300 km, the trip frequency in the ground-truth data starts to fluctuate, and its value difference between the model output rises. For São Paulo (Figure 8e-f), the model approximates the ground truth well as opposed to the benchmark that greatly overestimates short-distance trips below 3 km. The model output is similar to the ground-truth data for the rest of the distance ranges up to 40 km. However, the model overestimates the occurrence of long-distance trips above 40 km within São Paulo.
The similarity between the model output and the ground truth of trip distance distribution depends on data length ( Figure 9). We consider two types of data length: the total number of geolocations and the maximum number of geolocations used for each individual. The more geolocations we have in our model, the better its output resembles the ground truth (Figure 9a). For all the regions, we see a continuous increase in performance (declining KL divergence) and such trend even holds after we include all the individuals' data, especially for Sweden, the largest among the study areas, whose performance is far from saturation, unlike São Paulo. However, the model performance is not sensitive to increasing the maximum number of geolocations of each individual (Figure 9b). It seems a maximum of 200 geolocations per individual, even a large number of individuals have much less than 200 (median value about 140 per individual), suffices for generating similar trip distance distribution to the ground truth. Figure 9 suggests that a dataset covering a large enough population with a relatively small number of individual geolocations can be enough for the model to generate sensible travel demand.

D. PARAMETER TRANSFERABILITY
Consistent ground-truth data are not always available for those regions where one can collect sparse mobility data. If the good performance of the proposed model largely relies on external data sources to calibrate its parameters, its application is limited. Can we use a set of parameters learned from one region's ground-truth data to another without compromising the performance too much? To answer this question, we test how transferable the calibrated parameters are from one region to another. To do so, for each of the three regions, we run the model to synthesise mobility data  with the calibrated parameters of the other two regions and with the average value of the parameters of all three regions, and compare the results with their ground truth results. The performance gain is calculated as the relative decrease of KL divergence of the model as compared with the benchmark in %. A negative value of the performance gain indicates that the proposed model performs worse than the benchmark model. The relative performance, i.e., how good the model parameters of one region are to another, is quantified by the ratio of the performance gain (applying region / ego region). For the region results using its own model parameters, this relative performance is 100%. Figure 10 shows the results of the test of how transferable the calibrated model parameters are from one region to another. Except for the use of Sweden's parameters on the Netherlands, we observe only a small variation in relative performance. This indicates that the model performance is not very sensitive to the change of the parameters' values given a certain level of knowledge. And it is promising for reaching a good performance when using the calibrated parameters in other regions with similar sparse data. It is worth noting that, in some cases, we have a relative performance above 100%, which means that some other regions' model parameters are better than the ones found for one region. This is due to the fact that the Bayesian optimisation approximates the optimal parameters.

V. DISCUSSION AND CONCLUSION
This study proposes a model that fills the gaps in sparse mobility traces. The synthesised mobility data can be used for quantifying travel demand in terms of population flows and trip distance distributions. The proposed model extends the fundamental mechanisms of exploration and preferential return to synthesise mobility [8], and is tested on sparse individual traces found in geolocated social media data.
The proposed model generally performs better than the benchmark (heuristic) model in terms of quantifying population flows and trip distance distribution. Compared with the other methods addressing sparsity issues, the proposed model has a few advantages. First, instead of trajectory reconstruction which risks the invasion of privacy, our model estimates travel demand based on collective travel patterns. Second, it is based on fundamental mechanisms of human mobility, expressed in a simpler form than in previous studies [7]. Third, based on real-world data that are very sparse (around 1.5 geolocations per day covering 100 days on average), the proposed model shows good performance. This level of sparsity is higher than previous studies using CDRs [7], [25], [39].

A. MODEL DESIGN FOR SPARSE TRACES
Sparse mobility traces are often collected passively, only when the phone users are engaged in certain phone activities: making a call, messaging, tweeting with a geotag, or using location-aware applications. Hence, these geolocations are incomplete and sparse observations of the individuals' mobility. In a previous study of sparse geolocations from Twitter, we found that the long-term observation of individual geolocations captures both routine mobility and occasional exploration to new places [40], despite the proportion of regular locations to uncommon places deviating from the users' actual mobility [29]. Therefore, we follow the assumption that the rank order of places, based on their visitation frequency from sparse traces, are preserved [21].
According to the literature, we make two designs in the model accounting for the sparsity issue. First, we use the visitation frequency obtained from the Zipf's law when designing the probability function for returning to an old place (Section II-C.1), instead of the a visitation frequency directly calculated from the sparse input. In doing so, we attempt to exclude the bias of overly representing uncommon places in the sparse geolocations. Second, we create a two-dimensional collection of jump size (trip distance) and bearing for exploring a new place, instead of replicating the biased displacements in the sparse traces (Section II-C.2). This distribution is shaped by the individual's returning and exploring behaviour observed in the sparse input, and the visits to new places are constrained by where the individual lives and stays most of the time. The second design is similar to a study that introduces the heterogeneity of visiting directions [18] to the individual mobility model [8]. The difference is that we consider this directional preference at the individual level. In contrast, they consider how a large group of people influence each other, i.e., people tend to visit places that are frequently visited by others based on their empirical findings [18]. The integrated heterogeneity of visiting directions provides more spatial details. By these two designs, the proposed model synthesises the sparse traces into more representative mobility data.
The proposed model protects personal data and privacy by 1) clustering raw geolocations for identifying the home regions (see Appendix B) and 2) not reconstructing individual mobility trajectories that could potentially reveal the precise movement of each data contributor; instead, it creates synthetic mobility data from sparse inputs. The objective of the proposed model is to fill the gaps in sparse traces so that the synthesised mobility data are more representative of average daily visits and total distance travelled for further aggregation. Apart from constructing ODMs, we can also develop activity-based models driven by the modelsynthesised data for simulating individuals' daily activities. Using these synthetic data from easy-to-access geolocation big data, we can provide more timely and realistic trip data than traditional data-driven approaches [41].

B. MODEL PERFORMANCE
We use the available travel survey data to calibrate the customisable parameters of the proposed model for Sweden, the Netherlands, and São Paulo (Section III-B). It is worth noting that the model is designed in such a way that if there are "ground-truth" trajectories, the model can be calibrated against these data. In reality, it is difficult to access high quality ground-truth data, which often are either non-existence or outdated. Therefore, in this study, we calibrate the model against population-level data for three selected regions. The difference between the results using the calibration and the validation sample is small (Table 2). There are regional differences between the model outputs for the three regions. Overall, the model for São Paulo performs better than the ones for Sweden and the Netherlands. One reason for this relates to how the individuals' home locations are distributed across the study area. Previous studies have suggested that most active Twitter users live in urban areas [40], [42] and that using sparse geolocations of Twitter data for simulating travel demand is more suitable for urban residents than for the population as a whole. Another reason is that São Paulo has the smallest area but the greatest number of individuals and geolocations in the sparse traces. Given the impact of data length on the model performance (Figure 9), abundant data may contribute to its best performance among the three study areas. The same reasoning may explain the less ideal model performance in Sweden, where its performance may be further improved by covering a larger population (Figure 9a). The other reason may be due to the effect of the modifiable areal unit problem (MAUP), a phenomenon where spatial results vary depending on how the study area is divided into smaller analysis units [43], [44], [45]. We could not use a consistent gridding system to compare the model performance due to the predefined region-specific spatial zones of the groundtruth data. Therefore, the origins and destinations of trips are aggregated to different spatial zones for the three regions. With more precise ground-truth data, the model can be further investigated using a uniform gridding system to exclude the MAUP effect.
Based on the results of the parameter search, we observe that there is a large parameter space where the model performance is quite robust to a moderate range of values for the three parameters ( Figure 4). Our results suggest that the parameters calibrated for one region are transferable to another (Figure 10), except for using Sweden's parameters on the Netherlands. The exception may be due to the distinct geographical scales of these two countries and the MAUP issue. We need more in-depth analysis and a broader model test in different regions to understand the reasons better. In general, the proposed model with the parameters' average values has the potential to be applied to the other regions in the absence of ground-truth data.

C. LIMITATIONS AND FUTURE WORK
The proposed model for filling the gaps in sparse mobility traces has some limitations. (1) The proposed model synthesise mobility data by filling in the data gaps of sparse individual traces. However, due to the lack of matching individuals, our validation data represent the aggregated pictures of population flows and trip distances from daily trips. More steps can be taken to address the inherent inconsistency between the proposed individual-based model and the calibration to the population data. One future direction is to test the performance of the proposed model using high-resolution GPS data: with a more complete set of mobility trajectories, we can simulate a variety of sparsity levels by downsampling the observed locations and evaluate the impact of sparsity on the model's performance. (2) The model can be extended in future studies by integrating spatial context and temporal dimensions [7] to account for the types of activities based on the semantic context of historical trips. These improvements can make the synthesised mobility data more useful in transport planning. (3) The model simulates daily trips that always return to home therefore, an important aspect of mobility, overnight trips, is yet to be integrated for future improvements. (4) We use geolocations from Twitter as an example of sparse traces. However, Twitter has recently changed its policy, making the geolocations less precise [46]. Despite using the data collected before this significant change, future work will need to test the feasibility of the proposed model by using more sources of sparse traces such as mobile application data from more regions.

APPENDIX A NOTATIONS
The main symbols used in this manuscript and their definitions are briefly summarised in Table 4.

APPENDIX B DATA DESCRIPTION OF APPLIED SPARSE TRACES
Geotagged tweets are a typical source of sparse mobility traces. Twitter users can choose to geotag tweets, in which case the social media data include geolocation information. One can collect tweets from the Twitter User Timeline API to get a maximum of 3,200 tweets from a Twitter user's history, where a (small) portion of these are geotagged. We purchased data from a Twitter subsidiary, Gnip, to get a complete archive of geotagged tweets from a six-month period (20 Dec 2015 -20 Jun 2016), generated within the study areas: Sweden, the Netherlands, and São Paulo, Brazil. Using this Gnip dataset, we identified the top geotag users, i.e., those who generated at least 30 geotagged tweets during the data collection period. For the model experiment, we collected their user timelines to get their historical geotagged tweets.
Before these geotagged tweets can be used, we carefully preprocess them to reduce artefacts [20]. We remove: 1) Users who only geotag tweets of a single place, on suspicion of bot accounts, e.g., for job posting or weather updates. 2) Tweets for which the Twitter user posts a place's location, e.g., the centre of a country, instead of the tweet's precise GPS coordinates. 3) Those top geotag Twitter users who nevertheless have insufficiently many (< 20) geotagged tweets. 4) Tweets from before an apparent move to a study region. To protect privacy, we further cluster raw geolocations using DBSCAN so that the identified home location refers to an area [47] instead of a precise point on the map. The distance threshold for merging is set as 0.1 km. The minimum number of location for a region is set as 1.
The geotagged tweets after the above preprocessing are summarised in Table 5. The sparsity of the data is observed in all three regions, given that the number of geolocations per day ranges from 1.4 to 3.2, with all having fewer than two locations, which is far lower than the typical number  of visits per day such as 3.1 for Sweden. This makes it challenging to directly use these sparse traces to adequately model travel demand [20].

APPENDIX C DETERMINATION OF MODEL PARAMETERS M DAY AND D
The model parameter M day determines how many visits to generate for each simulation day, which can be empirically informed. In this study, we use the Swedish National Travel Survey to get the distribution (Figure 11a) that the model draws M day from. The model parameter D decides how many simulation days to generate data, which can be determined by experiments. In this study, we compare the model-synthesised ODMs with the ground truth (quantified by KL divergence) using varying values of D. Due to the stochasticity of the model output, we need more than one simulation day to achieve stable results of individual mobility trajectories. Figure 11b suggests a stabilised KL divergence after 260 days for all three regions. To balance the model performance and computation efficiency, we set D to 260. Figure 12 shows the trip frequency rate between zones from the validation results, comparing the model output with the benchmark output. The overall trend is consistent with the results from the calibration dataset as shown in Figure 6.