Modeling Censored Mobility Demand through Quantile Regression Neural Networks

Shared mobility services require accurate demand models for effective service planning. On the one hand, modeling the full probability distribution of demand is advantageous because the entire uncertainty structure preserves valuable information for decision-making. On the other hand, demand is often observed through the usage of the service itself, so that the observations are censored, as they are inherently limited by available supply. Since the 1980s, various works on Censored Quantile Regression models have performed well under such conditions. Further, in the last two decades, several papers have proposed to implement these models flexibly through Neural Networks. However, the models in current works estimate the quantiles individually, thus incurring a computational overhead and ignoring valuable relationships between the quantiles. We address this gap by extending current Censored Quantile Regression models to learn multiple quantiles at once and apply these to synthetic baseline datasets and datasets from two shared mobility providers in the Copenhagen metropolitan area in Denmark. The results show that our extended models yield fewer quantile crossings and less computational overhead without compromising model performance.

insufficient supply. The application of censored modeling and supply balancing for bike-sharing services are analyzed in [5].
Previously, Gaussian Processes have been the go-to model for modeling censored mobility demand which yields a complete distribution of latent demand [6]. However, while Gaussian Processes allow for a flexible, non-parametric fit, they still impose a Gaussian assumption on the latent distribution and face limitations when scaling to large datasets. The latter issue is becoming increasingly influential in the transportation domain and research as large datasets are increasingly used in transportation modeling [7], [8], requiring models to scale seamlessly for adequate censored modeling in the transportation domain.
This work proposes to model latent mobility demand via Multi-Output Censored Quantile Regression Neural Networks (Multi-CQNN). These models do not face the limitations mentioned above as they make no assumptions on the parametric form of the latent distribution and scale well to large datasets. As their name implies, Multi-CQNN estimates multiple quantiles of the predictive distribution while accounting for censorship in the observed demand. By being multioutput, they also address two drawbacks of single-output CQNN models, where estimating quantiles individually both incurs a significant computational overhead and yields crossing quantiles, wherein a lower quantile crosses a higher one [9]. We demonstrate the advantages of Multi-CQNN empirically on synthetic data as well as real-world shared mobility data.
To the best of our knowledge, this is the first work to apply (Multi-)CQNN in the transport domain. Furthermore, whereas existing works on CQNN often assume fixed censorship thresholds, we experiment with dynamic and random censorship thresholds, which make for a more complex modeling setting. We also provide a Python implementation of Multi-CQNN in https://github.com/inon-peled/CQRNN-pub/.
The rest of this work is organized as follows. In Section II, we review works related to CQNN and identify knowledge gaps, particularly in the transport domain. Section III then describes our Multi-CQNN methodology. Next, we demonstrate the advantages of our methodology via experiments: first with synthetic censored data in Section IV, then with real-world censored data from two shared mobility services (bikes and electric cars) in Section V. Finally, Section VI summarizes our findings and outlines our future work plans.

II. LITERATURE REVIEW
In this Section, we review existing works on non-censored and censored Quantile Regression (QR), and in particular, arXiv:2104.01214v2 [cs.
LG] 9 Jul 2022 Neural Network-based QR. We focus our review on censored QR approaches, as there also exists plenty of work other than QR to model censored data. For example, [10] and [11] systematically analyze and study the difference between point and section data from different sensors, to predict travel speeds and the effect on mobility service.
For the more general topics of Censored Regression and Neural Networks, we kindly refer the reader to the following resources. A review of censored and non-censored methods for mobility demand modeling appears in our recent joint work with [6]. The general theory and practice of Neural Networks is well studied in [12] and [13], and several of their recent applications in the transport domain are reviewed in [14] and [7]. Let us first introduce the general method of Quantile Regression [15], regardless of censorship.
For any probability distribution and 0 < θ < 1, the θ'th quantile is the smallest value at which the cumulative probability mass is θ. QR approximates a latent distribution by estimating its quantiles for given θ's, and thus does not presume any parametric form for the distribution. The regression itself can follow any functional form -e.g., linear [15], nonlinear [16], multivariate [17] or nonparametric [18] and the quantiles can be combined into a fully estimated distribution [19], [20].
Importantly, the fully estimated distribution preserves useful information, which might otherwise be lost through the more common practice of estimating only a few central moments, such as mean and standard deviation [21]. In turn, the preserved information allows for better informed decisions, e.g., service operators can use the full uncertainty structure of future demand to decide whether to balance the fleet conservatively or more opportunistically. In addition, by taking values of θ close to 0 and 1, QR can be more robust to outliers than mean regression [22].
A variant of QR is Censored Quantile Regression (CQR), where observed realisations from a latent distribution are assumed to be clipped at some thresholds. The distribution of these observations is then a mixture of continuous variables (within the observable range) and discrete variables (at each threshold). Many works on CQR build upon the early formulation by [23], [24], as we ultimately do too in Section III. Some of these works focus on estimators of derived CQR formulations [25]- [29]. Other works on CQR develop more complex and non-parametric models, as we next review, starting with models that are not based on Neural Networks.
[30] devise a Bayesian Inference approach to linear CQR, in which they use the Asymmetric Laplace likelihood, as also common in other Bayesian CQR works [31]. They evaluate their approach on some commonly used, censored data baselines: synthetic datasets, which we too use in Section IV, and a real-world dataset of women's labor force participation [32]. [33] develop a CQR model based on local linear approximations and evaluate it on synthetically generated datasets. [34] offer a Support Vector Machine-based CQR model, which they evaluate on the same synthetic datasets as in [33], and on a real-world dataset of heart transplant survival. [35] propose a Random Forest-based CQR, which they compare with other Random Forest models on both synthetic data and two realworld datasets: housing prices and a biliary disease.
In the last two decades, Neural Networks have increasingly been used for Quantile Regression (QNN) in multiple research areas, taking advantage of their flexible nonlinear modeling capabilities. For non-censored regression, [36] provides an early form of QNN with a single dense hidden layer, and uses it to estimate a latent distribution of multiperiod financial returns. In studies of the electric power industry, He et al. use non-censored QNN to estimate latent distributions of electricity production [37], [38] and consumption [39], while [40] and [41] use non-censored QNN to predict electricity loads. In the transport domain, [42] use a non-censored QNN with a single hidden neuron to predict 15 min air traffic in a Chinese airport, and [43] devise a non-censored, multi-output QNN that jointly estimates mean and quantiles, whereby they predict taxi demand in New York City, in 30 minutes intervals.
The multi-output QNN reduces the computational overhead of estimating multiple quantiles independently as the full latent distribution is estimated in one forward pass. Traditionally QR regression models face the problem of quantile crossing, wherein a lower quantile function crosses a higher one [9]. While this problem has been well studied for non-censored cases [43]- [48], our work is the first to study it in the context of censored data.
In fact, very few works apply QNN in a Censored setting (CQNN). [20] develops a general architecture for both QNN and CQNN, which he implements as an R package. He uses a smoothing technique by [49] to replace the loss function with a differentiable approximation, which is amenable to gradientbased training, and applies the implementation in a censored case study of precipitation forecasting. [50] propose another CQNN model, similar to but with deeper architectures than that of [20]. They implement their model in Python via Keras and apply it to censored survival datasets: a synthetic dataset and a breast cancer dataset.
In conclusion, [20] and [50] are yet the only studies on Quantile Regression Neural Networks in a Censored setting. In particular, there are no works on alleviating the computational cost of CQNN or the quantile crossing problem for censored cased studies. Moreover, there are no works on CQNN in the transport domain, despite the prevalent censorship in transport data with complex network-structures, behavioural feedback and demand and supply dynamics (Section I). We address all these gaps by devising a Multi-Output Censored Quantile Regression Neural Network and applying it to several datasets of real-world shared mobility services.
(4) Fig. 1 illustrates how TL penalizes the prediction error r = q i,θ − y i in a manner that depends on θ. For the median (θ = 0.5), the loss is the same regardless of the sign of r. For quantiles above the median (e.g., θ = 0.95), the loss is worse for y i >q i,θ than for y i <q i,θ with the same magnitude of r, and vice versa for quantiles below the median (e.g., θ = 0.05). For any θ, the loss equals zero if y i =q i,θ and is otherwise positive.
Based on Equation (3), we get the following likelihood function for left-censorship at a stochastic threshold τ i : Note that τ i must be specified also for all observations (both censored and non-censored).

A. Multi-Output Censored Quantile Regression Neural Network
A naive approach to QNN is to independently fit a Neural Network (NN) for each value of θ, while using either Equation (3) or (Equation (5)) as a loss function. As the computational cost of training multiple NNs can be high, we propose instead to use a Multi-Output Censored Quantile Regression Neural Network (Multi-CQNN). The corresponding architecture models multiple quantiles simultaneously, so that its output dimensionality equals the number of desired quantiles. That is, the Multi-CQNN architecture has an output neuron for each of the K different quantiles {θ k } K k=1 , as depicted in Fig. 2. By estimating all K quantiles together in one forward pass, Multi-CQNN eliminates the computational cost of independent training without drastically increasing the number of trainable parameters, as these are shared in the NN layers. Multi-CQNN can be viewed as a multi-task learner [51], where each output node has the task of estimating the quantile related to that node. We extend the loss from Equation (5) to the multi-output case by summing the loss from each task: whereq i,θ,k is the NN ouput for quantile k. If K = 1, so that only a single quantile is estimated, then Equation (6) indeed reduces to Equation (5). When K > 1, the NN parameters are shared across the different quantiles, and this has a regularising effect on the parameters and outputs.
In particular, this property is effective in alleviating quantile crossing, as we later show in Section IV-C. As noted in [44], quantile crossing is primarily caused by estimating quantiles individually, and so can be alleviated by limiting the flexibility in individual estimation [43]. Parameter sharing indeed limits this flexibility in Multi-CQNN by forcing it to learn a latent data representation that accounts for multiple quantiles at once.

B. Optimization
When dealing with right-censored datasets in next Sections, we slightly modify the architecture by negating the output quantiles and mirroring them (e.g., swapping the output for θ = 0.05 with the output for θ = 0.95). In this manner, the NN treats right-censored data as if it were left-censored. We fit the parameters of the models (denoted β) by minimizing the negative log-likelihood of Equation (5) and (6) using backpropagation with the Adam optimizer [52]. Minimisation of the log-likelihood functions mentioned above simplifies to minimisation of the censored quantile error function [20], [53]: and for the multi-output case: We use a learning rate of 0.01 with norm clipping at 1 and

IV. EXPERIMENTS FOR DEMONSTRATING THE ADVANTAGES OF CENSORED QUANTILE REGRESSION
In this Section, we empirically demonstrate some advantages of using a Multi Output Censored Quantile Regression Neural Networks (Multi-CQNN). First, we compare censorship-aware with censorship-unaware Quantile Regression models and show that censorship-awareness can better reconstruct latent values for both single-and multi-output models. Then, we compare Multi-CQNN to parametric Censored Regression and show the advantages and disadvantages of each modeling method. We then compare the quantile crossing problem between the single and Multi-CQNN, and show that our model has substantially less quantile crossings compared with the single output model, both for the censored and non-censored observations.
The experiments in this Section are based on commonly used, synthetic baseline datasets, and the subsequent Section proceeds to apply Multi-CQNN to model real-world transportation datasets. Each experiment is performed 10 times with equal starting conditions for all models. In the following tables, we report the average over these 10 runs and measure uncertainty via their standard deviation.

A. Non-censored vs. Censored Quantile Regression
Let us first show that the predictive quality of Quantile Regression Neural Networks can improve by accounting for data censorship. We use the same synthetic baseline datasets as in [30], as they consider a censored parametric Bayesian model, where the latent variable is where x 0 = 1 , x 1 ∈ {−1, 1} , x 2 ∈ R and the noise ε follows some distribution with 0 mean. Left-censorship occurs at zero, so that we observe For any random variable A and 0 < θ < 1, let q θ (A|x) denote the θ'th conditional quantile of A given Hence: Similarly to [30], we experiment with θ = 0.05, 0.50, 0.95 and three noise distributions, Heteroskedastic: Gaussian Mixture: The corresponding conditional quantiles of y are thus such that: Fig. 3 illustrates the distribution of y * with each noise, for x 1 = 1 and several values of x 2 . Fig. 4 illustrates the conditional quantiles of y * for each noise and θ. For Heteroskedastic noise, the conditional quantiles of y * are nonlinear, as their slopes change at x 2 = −1.
For each ε (j) , j = 1, 2, 3, we generate a synthetic dataset by independently drawing N = 1000 samples from ε (j) and x, where We then also compute the corresponding y * and y, and obtain that approximately 30% of the observations y  Table I provides the percent of zeros among the conditional quantiles q 1,θ (y . The most challenging cases to model are those with θ = 0.05, where the conditional quantiles are particularly prone to censorship. For each j = 1, 2, 3, we fix train, test and validation sets by randomly partitioning the j'th dataset as 62% : 15% : 33%, respectively. To model the θ'th quantile, we use several Neural Networks (NNs), each consisting of a single layer with activation function η. Namely, each NN takes as input x and outputs eitherq  for single-output NN, or for multi-output NN, where β are trainable parameters (weights). We let Multi-CQNN have K times as many trainable parameters as CQNN has (note that there are K independent CQNN's) to keep the number of parameters equal across the different models. Weights are initialized to 1, and in each training epoch, the whole train set is processed in a single batch. Training stops when the validation loss does not improve for 10 consecutive epochs. First, as an example of a model that ignores censorship, we use a non-censored linear model, where η is the identity function and the loss to be minimized is the tilted-loss without any accounting for censorship. Namely, for single-output models, while for multi-output models, We refer to these models as the Quantile Neural Network.
(QNN and Multi-QNN). Thereafter, we turn to censorshipaware models, where η is the identity function, so that similarly to [30], the single-and multi-output models are linear (CQNN and Multi-CQNN). We measure the predictive performance of each NN on the test set against the actual conditional quantiles of y * in Equation (16), (17) and (18).   The measures we use are Mean Absolute Error (MAE) and Rooted Mean Squared Error (RMSE). For any dataset, these measures are defined as follows: whereq θ is the mean of q θ,1 , . . . , q θ,N . Better predictive quality corresponds to R 2 closer to 1 and MAE and RMSE closer to 0. The results appear in Table II, which shows that the censorship-aware models outperform the censorship-unaware model. This holds when evaluating the entire test set or its non-censored subset -where the latent values are revealed. For the most challenging case of θ = 0.05, where more than 60% of the observations are censored, Multi-CQNN is the best performing model across all three synthetic datasets. As the number of censored observations decreases (θ = 0.95), we see the difference between the censorship-aware and unaware models becomes less pronounced.

B. Parametric vs. Non-Parametric Censored Quantile Regression
Since its introduction by [54], the Tobit model has become a cornerstone of parametric censored modeling. Tobit assumes that the latent variable depends on covariates linearly with Gaussian white noise, and is censored at a given fixed threshold. Hence in Tobit, the latent quantiles for the i'th observation are given by the parametric distribution where x i are covariates, β are linear coefficients to be estimated, and σ is standard deviation, either given or to be estimated too. Further, the Tobit likelihood for right censorship is where ϕ is the Probability Density Function (PDF) of N (0, 1), Φ is its Cumulative Distribution Function (CDF), and for a given fixed threshold τ i : The Tobit negative log-likelihood is then which we use as a loss function and minimise as described for the previous models.
Let us now compare Tobit parametric modeling to nonparametric CQNN, using the same synthetic baseline datasets as above. For both modeling methods, we use an NN with a single linear neuron, which we fit similarly to Section IV-A. When fitting the Tobit model, we fix σ = 1 and use the NLL of Equation (29) as the loss function, whereas when fitting CQR for θ = 0.05, 0.95, we use the NLL of Equation (5) or Equation (6) as the loss function. The quantiles from the Tobit model will then correspond to quantiles in the Gaussian distribution N x T i β , σ 2 Finally, we evaluate the performance via the Tilted Loss Equation (4) as well as two common measures of Quantile Regression, Interval Coverage Percentage (ICP) and Mean Interval Length (MIL): whereq i,θ is the predicted θ quantile for observation i and θ ≤ θ , so θ is a higher quantile than θ. For both measures, we define the prediction interval as the interval between the 0.05'th-quantile and the 0.95'th-quantile. The ICP should be close to 0.95 − 0.05 = 0.9, while MIL should be as small as possible. Among models with same ICP, we thus prefer the one that yields the lowest MIL. Table III summarizes the performance of Tobit vs. the QR models, i.e., CQNN and Multi-CQN. As expected, Tobit performs overall best on the synthetic dataset with Standard Gaussian noise, which most closely matches its modeling assumptions. When evaluated on all test observations, Tobit outperforms QR by obtaining an ICP closer to the desired 0.9 for each synthetic dataset. However, when evaluated on just the non-censored test observations (approx. 30% of each dataset), where the actual values are reliably known, QR outperforms Tobit while maintaining ICP close to 0.9. A particular limitation of the Tobit model is the relatively high MIL (which is constant in Tobit with fixed variance) compared to the CQNN models. The results thus suggest that Multi-CQNN tends to yield flatter distributions (higher MIL) that better approximate the latent distribution of non-censored observations.

C. Single-output vs. Multi-output Censored Quantile Regression
We now turn to the evaluation of the quantile crossing problem for the CQNN models. For this, we fit multiple singleoutput CQNN and one Multi-CQNN to estimate the deciles of the three different synthetic datasets. For all the estimated deciles it should hold thatq θ,1 ≤q θ,2 ≤ · · · ≤q θ,K assuming thatq θ,j+1 is a larger decile thanq θ,j , ∀j ∈ 1, . . . , K − 1. We evaluate violations of this order via two measures of quantile crossings: Lower values of these measures correspond to fewer quantile crossings, with zero corresponding to the best case of no crossings at all. Table IV summarize the quantile crossing performance of the models. We observe that the total number of crossing is substantially less for the Multi-CQNN compared to the single output CQNN. This is consistent across all the three synthetic datasets, both for the censored and non-censored observations, with less computational complexity than K single-output NN's. In conclusion, our experiments on the synthetic datasets have shown that the proposed Multi-CQNN model outperforms single-output CQNN in terms of quantile crossing, ICP and MIL.

V. EXPERIMENTS FOR ESTIMATING LATENT MOBILITY DEMAND
In this Section, we apply our Multi-Output Censored Quantile Regression Neural Network (Multi-CQNN) to real-world data from shared mobility services. Contrary to the synthetic datasets in the previous Section, real-world datasets do not feature the latent variable. Hence, similarly to [6], we treat the  available data as y * and manually censor it per various censorship schemes. We then fit CQNN and Multi-CQNN models for θ = 0.05, 0.95 and evaluate them via ICP Equation (30) and MIL Equation (31). First, we use a censorship-unaware NN with a single linear unit, which we train to minimize plain Tilted Loss Equation (22) (denoted QNN). Then, we equip the same architecture with censorship-awareness, using Equation (5) as loss (CQNN). To show the versatility of our proposed approach, we experiment with the addition of Long Short-Term Memory (LSTM) [55] to the CQNN (CQNN+LSTM), which we extend to Multi-CQNN. We chose he LSTM architecture based on its extensive use in the transportation domain [14], and note that our approach can similarly be used with any desired architecture.

A. Bike-sharing Data
The first real-world dataset is from Donkey Republic, a bike-sharing service provider in the Copenhagen metropolitan area in Denmark. The data consists of pickups and returns of bicycles in predefined hubs (in total 32 hubs), from 1 March 2018 until 14 March 2019, which we aggregate spatially into 3 "superhubs" and temporally by no. daily pickups daily, as in [6]. The superhubs are chosen based on distance from main tourist attractions and the central train station. Superhubs rarely run out of bicycles at any moment; hence this data represents actual demand quite well.
For this dataset, we experiment with partial censorship of the daily demand. This scenario occurs when the supply of bikes is lower than the actual demand for bicycles, which corresponds to lost opportunities for the bike-sharing provider.
We censor the data as follows: 1) Randomly select a γ portion of all y * i . 2) For each selected y * i , independently sample and let Our experiments use γ = 0.0, 0.1, . . . , 0.9 and (c 1 , c 2 ) = (0.01, 0.33), (0.34, 0.66), (0.67, 0.99). We define these values of (c 1 , c 2 ) as Low, Medium, and High censorship intensity. For each γ, c 1 , c 2 , we independently censor the data 10 times to obtain differently censored datasets B 1 , . . . , B 10 , and we partition each B j consecutively into train, validation and test sets with equal proportions. We then fit each NN model independently for 10 random initialisations of weights, drawn independently from N (0, 1), where we use the 7 previous lags of observations as explanatory variables. We define the censorship thresholds for non-censored observations as: An observation is then censored if and only if it is above the threshold. After fitting, we evaluate ICP and MIL for each γ, c 1 , c 2 as follows. We noticed that some experiments would result in the MIL being unreasonable high and decided to exclude these experiments in the results. Hence, for each of B 1 , . . . , B 10 , we consider only initialisations that yield reasonable validation MIL as: validation MIL mean of train y (j) ≤ 2 .
We then select the initialisation that yields validation ICP closest to 0.9. Finally, we average the test ICP and test MIL over the 10 selected initialisations. We summarize the results for the bike-sharing data experiments in Fig. 6 and Fig. 7 for the entire test set. In each Figure, rows range over superhubs, columns run over the censorship intensity, and each horizontal axis ranges over γ. We see that the worst-performing model is the censorship unaware. The ICP (Fig. 6) for the unaware model is substantially lower than the censorship aware models, and this difference becomes more prominent as the number of censored observations increases. We note that, as we increase complexity in the architecture, we find higher MIL (Fig. 7) and generally better ICP. The censorship-unaware model often yields the worst ICP. However, as found in [6], its ICP is occasionally better than that of the censorship-aware models, when relatively few observations are censored (γ ≤ 0.2). We also see that among CQNN models, the LSTM-based model often yields better ICP than the purely linear model. We note that the Multi-CQNN tends to have higher ICP than single-output CQNN, with a larger MIL as a trade-off between the ICP and MIL. A wider MIL may be a desired property to express uncertainty in the outputs from a conservative perspective.

B. Shared Electric Vehicles Data
The second real-world dataset comes from Share Now, a shared Electric Vehicles (EVs) service operator in the Copenhagen metropolitan area too. Users can pick up designated EVs from any location in the metropolitan area and return them to any parking spot within the region. In addition, there are small satellite locations where cars be picked up and returned (Fig. 8). This dataset, which we denote as D EV , consists of 2.6 million trips from 2016 to 2019, where each trip record contains the endpoints, driver ID and vehicle ID.
For this dataset, we experiment with complete censorship of daily demand of EV mobility, wherein all observations are censored. A scenario where this occurs is when multiple providers are competing for the same services. For example, one company might only observe the demand from their own fleet of EVs, and therefore all its observations are censored, as some demand is served by the competition. Consequently, every y i is censored, so that as illustrated in Fig. 9.
Since we censor all the observations, there is no need to specify a censoring threshold for them. For each α = 10%, 20%, 30%, 40%, we independently apply the censorship scheme 10 times. For each of the 10 censored datasets thus obtained, we partition into train:validation:test as 1 : 1 : 1 and fit each NN with random initialisation of weights, drawn independently from N (0, 1). Finally, we evaluate each NN by averaging its test ICP, MIL, and CL over the 10 experiments.
The results appear in Table V, where we see again that the censorship-unaware model mostly yields both the worst ICP and worst MIL. Among the censorship-aware models, Multi-CQNN with LSTM often yields the best ICP, and otherwise has ICP close to the ICP of the single-output LSTM model. For censorship-aware models, too, better ICP is accompanied by higher MIL, as in Section V-A. As expected with complete censorship, all models deteriorate rapidly as α increases, yielding ICP far below 0.9 for α = 0.4, where our experiments thus stop. Perhaps this deterioration would be less pronounced with larger datasets for higher levels of α's.

VI. CONCLUSION
In summary, we have addressed the problem of censored mobility demand and proposed to estimate the entire distri-  bution of latent mobility demand via Multi-Output Censored Quantile Regression Neural Networks (Multi-CQNN). Our approach mitigates the problem of censored and uncertain demand estimation, which is vital for mobility services driven by the user demand, in order to plan supply accordingly. First, we demonstrate the advantages of censorship-aware models on synthetic baseline datasets with various noise distributions, both homoskedastic and heteroskedastic. We find that CQNN outperforms censorship-unaware QNN on both the entire test set and its non-censored subset, where the actual values are reliably known. We also compare Multi-CQNN to the standard Tobit model, which assumes Gaussian white noise, and obtain that Multi-CQNN tends to yield flatter distributions that better approximate the latent uncertainty structure of non-censored observations. We also show that our proposed multi-output extension to CQNN produces substantially fewer quantile crossings for censored and non-censored observations.
Next, we apply Multi-CQNN to real-world datasets from two shared mobility services -bike-sharing and shared Electric Vehicles (EVs) -which we randomly censor either par-tially or entirely. For both datasets, more complex CQNN architectures yield higher MIL and generally better ICP. Adding a Long Short-Term Memory (LSTM) often leads to the best performance, and censorship-unaware QNN often produces the worst ICP. We observe that Multi-CQNN is performing on par with the single-output models while requiring less computational resources. In all experiments, we observe that the Multi-CQNN model tends to outperform the CQNN as censorship intensifies.
The experiments on synthetic and real-world datasets thus lead to similar conclusions about the effectiveness of the Multi-CQNN for censored regression, which further emphasizes the importance of accounting for censorship when modeling mobility demand. For future work, we plan to take advantage of possible Spatio-temporal correlations in the datasets, e.g., using Convolutional Neural Networks as in [57] or Graph Neural Networks [58], as well as compare Multi-CQNN to Censored Gaussian Processes [6]. In addition, we propose to explore the impact of censored regression in the operation of mobility services which is driven by demand [5].

VIII. BIOGRAPHY SECTION
Frederik Boe Hüttel is is currently pursuing the Ph.D. degree in Machine Learning for Smart Mobility with the Technical University of Denmark (DTU). His main research interests include machine learning models, intelligent transportation systems, and demand modeling. Filipe Rodrigues is Associate Professor at the Technical University of Denmark (DTU), where he is working on machine learning models for understanding urban mobility and the behaviour of crowds, with emphasis on the effect of special events in mobility and transportation systems. He received a Ph.D. degree in Information Science and Technology from University of Coimbra, Portugal, where he developed probabilistic models for learning from crowdsourced and noisy data. His research interests include machine learning, probabilistic graphical models, natural language processing, intelligent transportation systems and urban mobility.