Fairness-Enhancing Deep Learning for Ride-Hailing Demand Prediction

Short-term demand forecasting for on-demand ride-hailing services is a fundamental issue in intelligent transportation systems. However, previous research predominantly focused on improving prediction accuracy, ignoring fairness issues such as systematic underestimations of travel demand in disadvantaged neighborhoods. This study investigates how to measure, evaluate, and enhance prediction fairness between disadvantaged and privileged communities in spatial-temporal demand forecasting of ride-hailing services. We developed a socially-aware neural network (SA-Net) that integrates socio-demographics and ridership information for fair demand prediction, and introduced a bias-mitigation regularization to reduce the prediction error gap between black and non-black, and low-income and high-income communities. The experimental results, using Chicago Transportation Network Company (TNC) data, demonstrate that our de-biasing SA-Net model outperforms other models in both prediction accuracy and fairness. Notably, the SA-Net exhibits a significant improvement in prediction accuracy, reducing 2.3% in Mean Absolute Error (MAE) compared to state-of-the-art models. When coupled with the bias-mitigation regularization, the de-biasing SA-Net effectively bridges the mean percentage prediction error (MPE) gap between the disadvantaged and privileged groups, and protects the disadvantaged regions against systematic underestimation of TNC demand. Specifically, our approach reduces the MPE gap between black and non-black communities by 67% without compromising overall prediction accuracy.


INTRODUCTION
In recent years, on-demand ride-hailing services have grown rapidly.Transportation network companies (TNCs) such as Uber and Lyft provide the ride-hailing services by connecting passengers with drivers based on real-time information (1,2).Reliable and accurate short-term travel demand forecasting is a promising tool to balance vehicle supply and demand with low cost and high quality of service (3,4,5).Researchers have developed a series of data-driven approaches to predict travel demand in real-time, including time series analysis methods (6,7), machine learning methods (8,9) and deep learning models (3,10,11).These approaches typically divide the study region into small areas, use the past travel requests in a time interval as the historical demand, and then seek to enhance the prediction accuracy of the future travel demand as a function of the historical demand (assuming certain spatial and temporal correlations among them) and exogenous features such as the weather and holiday.
However, a narrow focus on prediction accuracy ignores the crucial social consequences underlying the prediction tasks, such as unfairness in travel demand forecasting.For instance, since the transport operators depend on the predicted passenger demand to dispatch vehicles, systematically under-predicted travel demand in disadvantaged neighborhoods may lead to inadequate service provision for certain groups.The existing literature has the following two limitations: first, most previous studies evaluated the performance of the demand predictions by the average prediction accuracy across the whole study region, while research into the disparity of predictive performance between the disadvantaged and privileged areas is very scarce.This raises an equity concern because if the ride-hailing demand for the disadvantaged neighborhoods is systematically underestimated, the vehicles allocated to these neighborhoods may not be enough to serve the actual demand.Second, most previous models did not consider the socioeconomic and demographic information of the areas when making travel demand predictions.Areas with different socioeconomic and demographic makeup could have very different spatial-temporal dependencies.Failure to account for the heterogeneity of these spatial-temporal dependencies can lead to biased model results.
To overcome these limitations, this paper proposes a novel strategy to improve prediction fairness while retaining high prediction accuracy.This strategy is comprised of a new deep learning architecture, named the socially aware network (SA-Net), and a bias-mitigation regularization method, to achieve fairness-aware travel demand predictions.While previous research typically adopted spatially-invariant convolutional kernels to capture spatial dependencies, this new network incorporates a novel Socially-Aware Convolution (SAC) module that adapts the standard invariant kernel at each area of the study region based on the socio-demographic makeup of that area, which is highly flexible and thus can better capture the spatial-temporal dependencies across different locations.The bias-mitigation regularization method modifies the traditional objective functions in deep learning travel demand predictions by adding a fairness regularization term, thus facilitating fair travel demand predictions.
To the best of our knowledge, this paper is one of the first attempts to improve prediction fairness in spatial-temporal travel demand forecasting of on-demand ride-hailing service.The main contributions of this paper can be summarised as follows: • We propose a new model (SA-Net) that adopts location-specific modification to the standard spatially-invariant convolutional filters.The proposed network can flexibly capture the spatial heterogeneity by incorporating the local socioeconomic and demographic information into the prediction process.
• We propose a fairness metric, the mean percentage error gap (MPE Gap), which measures the gap of mean percentage prediction error between the disadvantaged and privileged groups.A positive MPE indicates that the model has underestimated the demand, whereas a negative MPE indicates an overestimation of the demand.
• We develop a bias-mitigation regularization method that allows the network to learn fair predictions by bringing down the MPE Gap between the disadvantaged and privileged groups.
• Experiments on Chicago TNC data reveal the risk of generating spatially unfair demand prediction with the state-of-the-art spatial-temporal deep learning predictions, and show that our proposed new method can not only reduce the fairness gap between the disadvantaged and privileged groups, but also increase the overall prediction accuracy.
The rest of the paper is organized as follows.Section 2 reviews the existing literature on ridehailing demand prediction and fairness in machine learning.Section 3 defines the research problem.Section 4 describes the model architecture of the proposed SA-Net, the fairness evaluation metrics, as well as the bias mitigation regularization method.Section 5 shows the experiment results, which compare the prediction accuracy and fairness between the proposed de-biasing SA-Net and the benchmark models on the Chicago TNC dataset.Section 6 concludes the paper.

LITERATURE REVIEW 2.1 Spatial-temporal travel demand forecasting
Short-term travel demand forecasting has been a fundamental issue in intelligent transportation systems (3,10,12,13,14).The earliest research on travel demand forecasting was based on traditional time-series regression models such as Autoregressive Integrated Moving Average model (ARIMA) (6), Kalman Filter and their variants.For example, Li et al. (7) proposed an improved ARIMA based prediction method to forecast the spatial-temporal variation of passengers in urban hotspots.Lippi et al. (15) coupled the Seasonal ARIMA (SARIMA) model with Kalman filter to capture the seasonal patterns of temporal information for freeway traffic flow prediction.
In recent years, researchers are moving from classical statistical models to machine learning based approaches because of the explosive growth of data accessibility and computing power (16).Early machine learning models include support vector regression (8) and regression trees (9).More recently, deep learning methods have become increasingly popular due to its capability of approximating decision functions and its interpretability through economics theory (17,18).Studies also investigated how to integrate deep learning and classical statistics in travel demand prediction using statistical learning theory (19,20).The network structures in deep learning can successfully capture the complicated spatial-temporal correlations in the travel demand data for prediction.The major components are summarised below.
Recurrent Neural Network (RNN) is one of the most frequently used deep learning architecture to capture the sequential dependencies (21).However, RNNs suffer from the "vanishing gradient" problem which makes them impossible to learn long data sequences (22).As such, Long Short-Term Memory (LSTM) was developed as an enhanced form of RNNs and was widely adopted to explore long-range temporal dependencies in the data (23,24).
Previous studies have found that vehicle accumulation and dissipation can influence the traffic volume of nearby locations, therefore spatial correlations should be considered in demand forecasting (25).Convolutional Neural Networks (CNNs) have been widely adopted to capture the spatial correlations in grid-based travel demand predictions.CNNs capture the spatial dependencies using localized kernels.They were initially designed for the analysis of visual imagery, and were later applied to learn the local and global spatial correlations in travel demand forecasting (22).Graph Neural Networks (GCNs) are capable of capturing non-Euclidean spatial correlations in networkstructured data (26,27,28).GCNs were mostly applied in the station-based or traffic network accessible scenarios, and have only recently been applied to the region-based scenarios (29).
Some recent studies have integrated different deep learning neural networks to account for the complex spatial-temporal relationships in travel demand.Although the abovementioned methods have made remarkable progress on improving the prediction accuracy, most of them do not consider fairness when making predictions.Fairness essentially involves the evaluation of a predictive model regarding its social consequences, so without incorporating any socio-demographic information, the predictive models are hardly aware of the social consequences.Motivated by this research gap, this paper aims to evaluate and improve fairness in TNC travel demand forecasting by incorporating socio-demographics, proposing fairness metrics, and developing an fairness-enhancing prediction method.

Fairness in machine learning
There exists extensive machine learning literature showing that a model can act discriminatorily on one population in a variety of settings such as criminal risk assessment (31,32), clinical care (33,34) and credit risk evaluation (35,36).These studies made significant contributions in terms of formalizing fairness in machine learning (37,38), designing fairness-enhancing algorithms (39,40,41) and solving fairness concerns in real-world industries (42,43).
However, literature that investigated the algorithmic fairness issue in transportation research was very scarce.In the domain of travel behavior modeling, Zheng et al. (44) demonstrated prediction disparities regarding race, income, medical condition and region in travel behavior modeling using the 2017 National Household Travel Survey (NHTS) and the 2018-2019 My Daily Travel Survey in Chicago.The authors adopted an absolute correlation regularization method to mitigate the prediction biases.In the spatial-temporal travel demand modeling setting, to the best of our knowledge, very few studies tried to tackle the fairness issue.Yan and Howe (45) modified the loss function in deep learning to reduce the gap of per capita predicted bikeshare demand between the disadvantaged and advantaged regions.The modification is based on the fairness assumption that the per capita predicted demand should be the same across regions.Yan and Howe (46) leveraged adversarial learning to mitigate the gap in prediction errors of bikeshare demand between the advantaged and disadvantaged groups.
Although much progress has been made in addressing algorithmic bias, there are still several research limitations that need to be addressed.First, one critical source of bias is feature selection, where selected variables fail to capture sufficient details that affect different outcomes (47,48).To combat this, it is crucial to develop strategies to integrate sociodemographic information into the modeling process.Another limitation is that previous research on fairness has measured it based on the absolute value of demand, which can lead to errors in disadvantaged groups being considered insignificant.To address this, we propose a new measure of fairness based on the relative value of demand.This measure compares errors with the typical demand of the region, which is based on the concept of algorithmic fairness known as "equality of odds" (49).This principle requires that all individuals who have a TNC demand should have an equal chance of having it reflected in the prediction, regardless of their social and demographic characteristics.By using this new measurement of fairness, we can better understand and mitigate algorithmic bias in ride-sharing platforms.To address these limitations, we build upon the state-of-the-art spatial-temporal travel demand models, and propose a novel method for fair predictions of TNC travel demand.

PROBLEM DESCRIPTION AND PRELIMINARIES
The goal of this study is to predict the short-term TNC demand in the study area.Based on the method proposed by Ke et al. (3) and Guo and Zhang (10), the study area is partitioned into I × J grids with each grid referring to a zone.The temporal dimension considered is 1 hour.It is assumed that future TNC demand is correlated with the TNC demand in the past.It is also influenced by seasonality (time-of-day, day-of-week, etc.), and exogenous variables such as the weather.The variables examined in this study are defined as follows: 1. TNC demand The TNC demand at the tth time slot across the whole region is denoted as D t ∈ R I * J , which is defined as the number of TNC orders happened during that time interval.The TNC demand in grid (i, j) is then denoted as (D t ) i, j 2. Time-of-day, day-of-week, holiday By examining the Chicago traffic index data1 , we categorize 24h in each day into three periods: the peak hours (7am -9am and 3pm -7pm in workdays), the mid-peak hours (9am -3pm in workdays and 11am -7pm in weekends), the off-peak hours (7pm -7am in workdays and 7pm -11am in weekends).We use tod t to indicate this time-of-day variable, which takes values 0, 1, 2 if t belongs to the off-peak hours, the mid-peak hours and the peak hours, respectively.dow t is the day-of-week variable, which takes value 1 if t is among the weekdays and 0 if t is among the weekends.The dummy variable h t is used to indicate whether t is in a holiday or not.

Weather
We consider precipitation as the weather variable, which is denoted as p t .The precipitation data is obtained from the website of National Centers for Environmental Information (50).

Socio-demographic data
The 2019 American Community Survey (ACS) 5-year estimates data is used to produce socio-demographic data by census tract.The socio-demographic variables include total population, population per squared kilometers, percentage of African-American population, percentage of female population, percentage of spanish speakers, percentage of foreign-born population, median household income, percentage of population with 2019 household income lower than $25,000, percentage of college graduates, percentage of population with age between 25 and 34, percentage of population with age over 65, percentage of transit commuters and percentage of population with no household vehicles.We use Z p to represent the p th socio-demographic variable and use P to denote the total number of socio-demographic variables.
The target of this study is to predict the TNC demand at time t (D t ), given the historical TNC demand, the time series features and the socio-demographic variables: {D s , p s s = 0,...,t − 1}, {tod s ,dow s ,h s s = 0,...,t} and {Z p p = 1,...,P}.This research focuses on two objectives: prediction accuracy and fairness.Prediction accuracy refers to the goal of minimizing the overall prediction errors.Prediction fairness refers to the goal of reducing the gap in mean percentage errors between the disadvantaged and privileged groups.Prediction fairness is also enhanced if the model increases the accuracy for the disadvantaged group more than the privileged group.

METHODOLOGY
This research designs a novel SA-Net to predict the short-term TNC demand with enhanced fairness.We first introduce the Socially-Aware Convolution (SAC), a base module that is repeatedly used in SA-Net, and describe how SAC is adapted from the standard CNN.We will then introduce SAC-LSTM which combines SAC and LSTM.After that, we will explain the complete model architecture used in this study.

CNN and SAC
In this section, we will start with a formulation of the standard convolution neural network, and then extend it to the Socially-Aware Convolution (SAC).The concept of SAC is illustrated in Figure 1.We start from a standard convolution, which can be written as: where Y ∈ R O×S×S denotes the output tensor, X ∈ R I×S×S is the input tensor, W ∈ R O×I×S×S denotes the filter weight.O, I, S and V represent the output channel size, input channel size, image size, and kernel size.[p,q] denotes the pixel coordinates.m and n are the indices for the output and input channels.
. σ is the activation function.From Equation 1, we can see that the filter weight W [m,n,i, j] is invariant to image locations.Therefore, the standard convolution is content-agnostic.To account for the local information, we use the Socially-Aware Convolution (SAC) which was built upon the work by (51).A SAC modifies the spatially invariant filter W with an adapting kernel K, which can be expressed as follows: where F ∈ R S×S is the feature map, which will be explained in the following subsection.K represents the Gaussian kernel function: ).The kernel values are higher for regions with similar feature values.The SAC operation represented by Equation 2adapts the standard convolution filter W at each pixel by multiplying the spatially-invariant filter W with a spatially-varying adapting filter K.The feature map F picks up local features that reflect the relationships between different regions on the map.

Feature map construction
We construct the feature map F as a linear combination of various socio-demographic variables, which is shown in Figure 2. The feature value f i j is calculated as represents the value of the socio-demographic variable p (e.g.population density, race, income etc.) for region [i, j].By applying a Gaussian kernel function to the feature values of the center pixel and its surrounding pixels, for each pixel value prediction, we emphasize the neighboring pixels that are more similar to this specific pixel in terms of the socio-demographic features.The underlying assumption is that the regions that have similar socio-demographic characteristics with their neighborhoods should have similar level of TNC demand with their neighborhoods as well.

LSTM and SAC-LSTM
We use LSTM, a special kind of Recurrent Neural Netowrk (RNN), to process the temporal information.LSTM is designed to avoid the long-term memory problem.The model first passes a sequence of input vectors to the memory cell tensors through the input gate, and then drops the redundant information through the forgot gate, and the cell state will be updated accordingly.Fi- nally, after several iterations, the output gate will output a hidden sequence (52).
When dealing with the travel demand forecasting problem with spatial-temporal data, Ke et al. (3) proposed using the Conv-LSTM, which is a network that combines CNN and LSTM, to capture the spatial dependencies.Unlike LSTM, Conv-LSTM converts all the inputs, memory cell values, hidden states and various gates from 2D matrices to 3D tensors.Besides, Conv-LSTM replaces the Handamard product with the convolutional operator, which is used to explore spatially local correlations.However, Conv-LSTM utilizes the standard convolutional filters which are replicated across the tensors with shared weights, thus failing to account for the heterogeneity of spatial correlations.To address this drawback of standard convolutions, we modify the Conv-LSTM by replacing the standard convoluations with the SAC, and name the new network SAC-LSTM.The formulation of SAC-LSTM is as follows: where the weight matrices W x f ,W h f ,W xc ,W hc ,W xo ,W ho denote the SAC weights, which are represented by W' in Figure 1."*" stands for the convolutional operator.I t , F t , C t , O t , H t are improved input gate, forgot gate, cell state, output gate and hidden state that embeds the spatial dependencies.○ denotes Hadamard product (i.e.element-wise product).σ and tanh are nonlinear activation functions:

Model description
In this section, we propose a novel socially aware network (SA-Net) to forecast the short-term TNC demand.The architecture of the network is illustrated in Figure 3, which is comprised of two parts: the part on the right captures the spatial-temporal variables (i.e.TNC demand) using a stack of SAC-LSTM layers, and the part on the left processes the non-spatial temporal variables using a stack of LSTM layers.
where U t−k , k = 1,2,...,d represent the output tensors at the last layer of the stacked SAC-LSTM layers.W ux represents the convolutional operation with the SAC kernel, which is applied to further capture the spatial dependency at the final layer, and also to reduce the number of output channel to 1.

Structure for temporal variables
The temporal predictors used in this study include the time-related variables and the weather feature.The time-related variables include time-of-day, day-of-week and holiday indicators.The weather feature is represented by the amount of precipitation.We create a new variable v t = (dow t ,tod t ,h t ) that concatenates dow t , tod t and h t , and use p t to represent the amount of pre-cipitation at time t.These temporal features are likely to impact the TNC demand across the whole region.Then the network for the time-series variables can be written as follows: where V t−d and P t−d , k = 1,2,...,d are the output tensors at the last layer of the stacked LSTM layers for the time variables and the precipitation variable.w vx and w px denote the fully connected layers following the stacked LSTM layers, which reduce the number of output channel to 1. F R denotes a reshaping function that repeat a value across the space: , where (F R ) m,n,1 = x for any m ∈ (1,2,...,M),n ∈ (1,2,...,N).F R is deployed to make the dimensions of the LSTM outputs Xv and Xp the same with the SAC-LSTM output Xu .

Fusion
The final estimated TNC demand at time t is a weighted combination of the estimated outputs from different parts of the network, which is given by:

Accuracy and fairness metrics
The performance of the various models is evaluated based on two types of metrics: the accuracy metrics and the fairness metrics.Two commonly used accuracy metrics -Mean Absolute Error (MAE) and Mean Absolute Percentage Error (MAPE) -are adopted to evaluate the prediction accuracy of the models in this work.They are defined as below: where y i t and ŷi t are the real and predicted travel demands at time interval t in region i.T represents the total number of time intervals.N represents the total number of regions.N t denotes the set of regions with y i t > 0.1, which is defined to guarantee that the denominator of the absolute percentage error for the regions included is not zero.
While MAE and MAPE have been widely utilized to measure the accuracy of the model predictions, one limitation of these two metrics is that they do not consider the directions of the errors.Given that the underestimations and overestimations of the TNC demand predictions have very different practical implications which should not be ignored, we also examine the Mean Percentage Error (MPE) of the model predictions which is given by: The positive value of MPE indicates the underestimation of the TNC demand (i.e. the real demand is larger than the predicted demand), whereas the negative value of MPE indicates the overestimation of the TNC demand.The magnitude of a positive percentage error in region i at time t can be thought of the chance of an individual in region i at time t who had the TNC demand but failed to receive the service, if the TNC service was exactly allocated based on the TNC demand estimation.Therefore, it is important to make sure that the MPE is not systematically different between the disadvantaged and privileged communities.This concept is connected to one important notion of algorithmic fairness -equality of odds, which states that a predictor Ŷ satisfies equalized of odds with respect to protected attribute Z and outcome Y, if Ŷ and Z are independent conditional on Y (49).
We propose the MPE gap as a fairness metric, which measures the difference of MPE between two groups (e.g. the black communities and the non-black communities).The metric is defined as: where Z 0 denotes the minority group and Z 1 denotes the majority group.Therefore, i ∈ Z 0 represents the set of regions that are within the minority group, and i ∈ Z 1 represents the set of regions that are within the majority group (i.e.not within the minority group).For example, if the sensitive variable of interest is ethnicity and Z 0 is used to represent the black-dominated communities, then Z 1 represents the non-black communities.In this case, i ∈ Z 0 and i ∈ Z 1 denote regions that belong to the black communities and those that belong to the non-black communities, respectively.
To achieve a fair prediction, we want the absolute value of MPE gap to be as close to zero as possible.A positive value of MPE gap indicates that we are underestimating the TNC demand for the minority group compared with the majority group, whereas a negative value of MPE gap suggests a relative underestimation of the demand for the majority group.

De-biasing objective function
To jointly train for accuracy and fairness, we use a loss function that is a weighted sum of an accuracy loss and a fairness loss defined as below: The accuracy loss is aimed at reducing both MAE and MAPE: where y i t and ŷi t are the real and predicted travel demands at time interval t in region i.T represents the total number of time intervals.N represents the total number of regions.N t denotes the set of regions with y i t > 0.1.λ is a regularization parameter balancing the MAE and MAPE tradeoff.In this study, we fix λ to be 10 since the magnitude of MAE is roughly ten times that of MAPE.
The fairness loss is proposed as the following: where z i denotes the value of the sensitive attribute (e.g. the proportion of black population) for region i. zi is the normalized z i with z and σ z respectively representing the mean and standard deviation of z i across all regions.
L f airness measures the linear relationship between the sensitive attribute z and MPE across time and space.To be specific, zi * y i t − ŷi t y i t measures the joint deviations of zi and y i t − ŷi t y i t from zero.Therefore, L f airness indicates the covariance between z and MPE in the prediction, which we want to penalize in our training process.

EXPERIMENTS 5.1 Data Description
The dataset utilized in this paper is a large-scale TNC trip record dataset collected from Chicago Data Portal (53) during a 14-month period between November 1st, 2018 to December 23rd, 2019.The trip records that started from 6 AM to 10 PM are included.We partition the city of Chicago into 1km×1km grids, and use totally 35×5 grids for analysis as shown in Figure 4.The hourly TNC demand in a region is represented by the number of trips starting from that region in a 1-hour time interval.The weather data is collected from the website of National Centers for Environmental Information (50).The socio-demographic variables including the percentages of black population and the percentage of low-income population are extracted from the 2019 American Community Survey(ACS) 5-year estimates (54).
Figure 4 illustrates the distributions of average hourly TNC demand in the study period, the percentage of black population and the percentage of low-income population in the study area.From Figure 4(a), we can see that the spatial distribution of the TNC demand is highly uneven, as the downtown area takes up the majority of the TNC demand.In terms of ethnicity, Figure 4(b) reveals a bimodal distribution of African-American population, with the majority of the northern area having African-American population below 13% and the majority of the southern area having African-American population above 88%.We define population with 2019 household income lower than $25,000 as low-income, and Figure 4(c) shows that the low-income population is also mainly clustered in the south side of the study area.In this study, we define grids with over 50% of black population as the black communities, and the rest as the non-black communities, which gives us 73 black communities and 102 non-black communities.Regarding income, we defined grids with more than 25% of low-income population as the low-income communities, and the rest as the high-income communities, resulting in totally 90 low-income communities and 85 highincome communities.In both cases, the numbers of disadvantaged and privileged communities are

Model Comparison
To explore the advantage of our model SA-Net, we compare it against several other benchmark models, which are listed as follows: • Historical Average (HA): HA predicts the TNC demand by averaging the historical demand which is in the same relative time interval (i.e. the same time of day and the same day of week) in the training set.For instance, the TNC demand in Monday 10 AM  • Moving Average (MA): MA predicts the TNC demand by averaging the demand in the same relative time interval of several nearest historical values.We use the average of 6 previous TNC demand in grid (i, j) to predict the demand in grid (i, j).
• Autoregressive Integrated Moving Average Model (ARIMA): ARIMA is commonly used for forecasting time-series data (55), and has been widely applied in traffic prediction problems (56,57).In this work, to predict the TNC demand in grid (i, j), the inputs to ARIMA were 6 previous demand in the same relative time interval in grid (i, j).
• LSTM Net: The LSTM Net processes the TNC demand in each grid separately.The hyperparameters and the structures of the LSTM Net and the SA-Net are the same.The only difference is that while we use a stack of SAC-LSTM to process the spatial-temporal data as shown in Figure 3, the LSTM Net uses the LSTM modules to processes the TNC demand data and does not capture spatial dependencies.
• LSTM + Social Net: The LSTM + Social Net adds a socio-demographic feature map to the LSTM Net to facilitate predictions.The feature map is constructed as a linear combination of different socio-demographic variables as shown in Figure 2, and is fused with other parts of the network in the last model layer following Equation 7.
• Conv-LSTM Net: The Conv-LSTM Net is a fusion convolutional LSTM specified in (3).The hyperparamters and the structure of the Conv-LSTM Net are the same with the SA-Net and the LSTM Net.The difference is that the Conv-LSTM Net uses the traditional Conv-LSTM modules instead of the SAC-LSTM modules in Figure 3 to process the spatial-temporal TNC demand data.
• Conv-LSTM + Social Net: Similar to the LSTM + Social Net, the Conv-LSTM + Social Net adds a socio-demographic feature map to the Conv-LSTM Net to facilitate predictions.

Experiment setup
When training Conv-LSTM Net and SA-Net, we use kernels with size of 3 × 3.Each Conv-LSTM cell and each SAC-LSTM cell consists of 64 filters/channels to capture the spatial information.The experiments are implemented in Pytorch using the mini-batch stochastic gradient descent method with a batch size of 64 and a step size of 0.001 in each training.The model that produces the lowest prediction loss on the validation set among the 300 epochs is chosen.For both Conv-LSTM Net and SA-Net, we train the model with the number of layers being 1, 2 and 3 and choose the one that produces the lowest prediction loss.The optimal model later performs prediction over the test data.We run the training procedure 3 times and report the average prediction results on the test set.

Results
We compare our proposed algorithms (SA-Net with bias-mitigation regularization) with baseline models along two dimensions: accuracy and fairness, and show that our algorithm achieves better results regarding both accuracy and fairness.The better prediction accuracy is demonstrated by lower MAE and MAPE.The better prediction fairness is shown in terms of two aspects: first, we will show that our model reduces MAE for the disadvantaged groups to a greater extent than the privileged groups compared with Conv-LSTM Net; second, our results show that the proposed bias mitigation strategy can reduce the MPE gap between disadvantaged and privileged groups while not harming the overall prediction accuracy, thus achieving a fair prediction.

Prediction accuracy
The spatial-temporal deep learning algorithms (i.e.Conv-LSTM Net, Conv-LSTM + Social Net and SA-Net) outperform the classical statistical models (i.e.HA, MA, ARIMA), and our proposed SA-Net model produces the smallest overall MAE and MAPE among all models on the test set.Table 1 reports the overall MAE and MAPE, as well as the MAE and MAPE for the black, non-black, low-income and high-income communities.Regarding the overall MAE and MAPE, Conv-LSTM Nets and SA-Net significantly outperform other models, indicating the predictive power of these models which capture both temporal and spatial dependencies.Conv-LSTM + Social Net performs slightly worse than Conv-LSTM Net, probably because of overfitting of the model.This result suggests that simply adding the socio-demographic variables as predictors does not improve the model performance.We now compare the results between the two best-performing models regarding the overall MAE and MAPE: Conv-LSTM Net and SA-Net.
Comparing Conv-LSTM Net and SA-Net, we find that SA-Net reduces MAE for both the black and non-black communities.It also reduces MAE for both the low-income and the high-income communities.These results indicate that by incorporating the socio-demographic information, SA-Net benefits both disadvantaged and privileged groups.In addition, the disadvantaged communities experience a larger decrease in MAE with SA-Net compared with Conv-LSTM Net.From Conv-LSTM Net to SA-Net, the reductions in MAE for the black and non-black populations are respectively 0.12 and 0.05, and the reductions in MAE for the low-income and high-income populations are 0.10 and 0.03, respectively.This result shows that our proposed SA-Net can not only improve the overall model performance, but can also promote fairness in predictions by increasing more prediction accuracy for the disadvantaged populations while not harming the performance of the privileged populations.Note: for the deep learning models, we report the results when the models are trained with γ = 0 Next, SA-Net improves prediction accuracy for the black communities at all times of day compared with Conv-LSTM Net.We examine the model performance for Conv-LSTM Net and SA-Net across different times of day in Figure 6.The upper row of Figure 6 shows the predictive results for the black communities, whereas the bottom row of Figure 6 shows the results for the non-black communities.Nevertheless, it is worth noting that the MAEs for the black communities generated by the deep learning models are always higher than those generated by the classical statistical approaches (i.e.HA, MA, ARIMA) as shown in Table 1.These results may indicate that while the deep learning models significantly advances the overall prediction accuracy, this accuracy gain mainly comes from the improved model performance for the privileged communities, while the prediction accuracy for the disadvantaged communities could become worse.This highlights the necessity of minimizing the potential harms for the disadvantaged groups while scholars are increasingly in favor of the deep learning models over the traditional methods, and our approach contributes to this goal by bringing down the MAE for the disadvantaged communities in deep learning models.

Prediction fairness
Having demonstrated the superiority of our proposed SA-Net over the benchmark models in terms of prediction accuracy, we now test the effectiveness of our bias mitigation strategy stated in Section 4.6 for fairness improvement.First, we test the results when the sensitive attribute is race, namely when z denotes the proportion of black population in Equation 14.Table 2 presents the results, and Figures 7(a) and 7(b) plot MPE for black and non-black groups as well as the overall MAE.Table 2 shows that when there the de-biasing regularizer is not applied (γ = 0), both Conv-LSTM Net and SA-Net produce large MPE gaps between black and non-black groups.Specifically, the MPE gap (race) with γ = 0 is 0.361 for Conv-LSTM Net, whereas the MPE gap (race) with γ = 0 is 0.272 for SA-Net.For both models, the large MPE gap comes from a large, positive MPE for the black group and a small, negative MPE for the non-black group.Note that the magnitude of a positive MPE indicates the degree of underestimation of the demand, since MPE represents the average gap of the actual and predicted demand weighted by the actual demand.Larger the MPE, higher the underestimation.Therefore, the large MPE gaps between black and non-black groups indicate that training models using the traditional objective function without bias mitigation leads to systematic underestimation for the black group compared with the non-black group.
Recognizing the prediction bias using only L accuracy in training, we adopt bias mitigation by increasing the bias mitigation weight γ from 0 to 5 and 10.The results for "MPE gap (race)" in Table 2 show that for both models, as γ increases, the MPE gap between black and non-black groups decreases, and this reduction in MPE gap mainly stems from the reduction in MPE for the black group.Specifically, when increasing γ from 0 to 10, the MPE gap between the black and non-black groups drops from 0.361 to 0.084 for Conv-LSTM Net, and drops from 0.272 to 0.111 for SA-Net.It is also found that by mitigating the racial bias, the MPE gap between the low-income and high-income groups has also been reduced, probably because most low-income and black communities are clustered in the south side of Chicago (Figure 4), thus by mitigating bias for race, the prediction bias (MPE gap) for income has been reduced simultaneously.
Figures 7(a) and 7(b) plot MPE for black and non-black groups as well as the overall MAE corresponding to Table 2.As we increase the bias mitigation weight (γ), the prediction MPEs for the black population (denoted by the green bars) decrease considerably, indicating that with the bias mitigation loss function, the underestimation of TNC demand for the black population has been mitigated.2 and Table 3 Then, we apply the same bias mitigation strategy to mitigate the MPE gap between the low-income and high-income groups.In this case, z denotes the proportion of low-income population in Equation 14.The results for "MPE gap (income)" in Table 3 show that similar to the bias mitigation results for race, the MPE gaps decline as γ increases when mitigating the income bias for both Conv-LSTM Net and SA-Net.Specifically, when γ increases from 0 to 10, the MPE gap between the low-income and high-income groups decreases from 0.193 to -0.044 for Conv-LSTM Net, and decreases from 0.127 to 0.059 for SA-Net.The MPE gaps between two income groups are also plotted in Figure 7(c) and Figure 7(d), where we can see that the de-biasing regularization especially works well to reduce the MPE gap in the Conv-LSTM Net.In addition, we find that the improving prediction fairness does not necessarily sacrifice prediction accuracy.The orange dots in Figure 7 denote the MAEs produced by different models, which show that compared with no bias mitigation, only Conv-LSTM shows a slight increase in MAE when mitigating the racial bias (Figure 7(a)).Under other circumstances, the application of bias mitigation actually also brings down MAE.Notably, when increasing the mitigation weight γ for income from 0 to 5 for SA-Net, the prediction accuracy has been greatly improved (MAE=6.198)compared with the case when no bias mitigation is adopted (MAE=6.279).
We also examine the change of average MPE in different time of day with different bias mitigation strategies in Figure 8. Figure 8(a) and 8(d) show that by increasing the bias mitigation weight γ from 0 to 5 and 10, the MPE for the black communities consistently decreases in all times of day for both the Conv-LSTM Net and the SA-Net, and the bias mitigation effect is slightly stronger in the Conv-LSTM Net case.For the Conv-LSTM Net, we observe that when γ = 10, the morning peak MPE decreases to around zero, and the MPE for the evening period (after 6 pm) becomes negative.On the contrary, Figure 8(b) and 8(e) show that the effects of the bias mitigation method on the MPE for the non-black communities are relatively small.The increase of γ is associated with a small drop of MPE in the Conv-LSTM Net case and a small rise of MPE in the SA-Net case. Figure 8(c) and 8(f) plot the gaps in MPE between the black and non-black communities given by Conv-LSTM Net and SA-Net, which show that for all times of day, increasing the bias mitigation weight reduces the MPE gap between the black and non-black communities.All in all, our results suggest that our proposed bias mitigation strategy can significantly mitigate the travel demand underprediction issue for the black communities in all times of day with both the Conv-LSTM Net and the SA-Net, and can effectively reduce the prediction bias between the black and non-black groups.In summary, our proposed de-biasing regularization method can considerably reduce the prediction bias measured by the MPE gap between the disadvantaged and the privileged groups for both Conv-LSTM Net and SA-Net.This gain in prediction fairness can be achieved while keeping the prediction accuracy high.For SA-Net, adopting bias mitigation can even increase prediction accuracy.

Spatial patterns of errors
To better understand the spatial heterogeneity of the prediction errors, we show in Figure 9 the spatial distributions of MPE using SA-Net for three prediction strategies: prediction with no biasmitigation, with race bias mitigation and with income bias mitigation.Areas with positive MPE (indicating that the TNC demand has been underestimated) are denoted by the red color, whereas areas with negative MPE (indicating demand overestimation) are denoted by the blue color.Figure 9(a) shows that when no bias mitigation is adopted, the south side of the study area, which has greater populations of low-income and African-American people, suffers from severe demand underestimation.When we add the bias mitigation for race and income, the results in Figure 9(b) and Figure 9(c) show that the grid colors in the southern areas have become much lighter, and the colors of several areas in the south switch from red to blue, suggesting that the underestimation issue has been remarkably alleviated.
The Moran's I measure has been calculated for MPE in each scenario.Moran's Index, or simply Moran's I, is a spatial autocorrelation coefficient that has been widely used to measure the degree to which geographic events clustered in the study area.The results show that by applying bias mitigation for race and income, the Moran's I value reduces from 0.477 to 0.284 and 0.39, respectively, which demonstrates that our bias mitigation method can help reduce spatial clustering of MPE.First, though previous studies have shown that there is significant difference in travel characteristics across demographic groups (58,59,60), most spatial-temporal research failed to account for the socio-demographic heterogeneity in travel demand predictions.We propose a new model structure SA-Net that can flexibly capture the variations of spatial correlations across different socio-demographic groups.The experimental result shows that our proposed SA-Net can both improve the overall prediction accuracy and promote fairness by delivering more accuracy gain for the disadvantaged (i.e.black and low-income) communities while not harming the performance of the privileged (i.e.non-black and high-income) communities.
Second, we find that previous solutions to spatial-temporal travel demand prediction problems tended to underestimate demand for the low-demand regions.This is because high errors in lowdemand areas will significantly impact MAPE, thus optimizing MAPE would likely underestimate demand in these areas.To tackle this issue, we use the mean percentage error gap to measure prediction fairness, and propose a novel regularization method to mitigate the bias between the disadvantaged and privileged groups through disentangling the correlation between the sensitive attribute and the mean percentage error.Our experimental results show that the new algorithm can effectively mitigate prediction bias for both the traditional Conv-LSTM Net and the new SA-Net.It can also protect the disadvantaged regions against systematic underestimation.Our results also show that the new method can improve prediction fairness while retaining high prediction accuracy.
Overall, we argue that the prediction bias issue revealed in this work should attract the attention of the researchers and policy makers, because if the travel demand in the disadvantaged neighborhoods is systematically underpredicted, we may fail to provide enough TNC services to these communities, and the limited services will in turn lead to further decrease of the ridership, which will eventually lead to a negative feedback loop.The method proposed in this study has been proven to be capable of tackling this prediction bias issue and promoting both accuracy and fairness.
We identify several future research directions worth investigating.First, this paper evaluates fairness in travel demand prediction and demonstrates the utility of the de-biasing mitigation method on Conv-LSTM Net and the SA-Net.However, the proposed fairness evaluation metrics and the bias mitigation method are widely applicable.They can also be applied to other spatial-temporal deep learning networks such as the spatial-temporal residual networks ST-ResNet (61) and RSTN (10).Second, this study aims to implement fair predictions for on-demand ride service.However, our proposed fairness-enhancing method should also work well for other spatial-temporal settings, such as bikeshare demand prediction, public transport demand prediction and crime incidents prediction.Future research can test the performance of the proposed method on various downstream applications.Third, we test our method on Chicago's TNC data as the real-world application.
Future work can study the transferability of our method to other applications or cities.

FIGURE 2 :
FIGURE 2 : Feature map (F) construction.Z p i j represents the value of the socio-demographic variable Z p for pixel [i,j]

FIGURE 3 :
FIGURE 3 : The structure of SA-Net

FIGURE 4 :
FIGURE 4 : Distributions of TNC demand, black population and low-income population in the study area

Figure 5
Figure5illustrates the average TNC travel demand by time of day, separated by the disadvantaged (black/low-income) and privileged (non-black/high-income) communities.The travel demand in the privileged regions are much larger than that in the disadvantaged regions, therefore the y-axis scales are different in Figure5(a) and Figure5(b).The privileged regions and the low-income communities have two peak periods: 7 AM -10 AM and 5 PM -8 PM, whereas the black communities only has the morning peak.

FIGURE 5 :
FIGURE 5 : Average TNC travel demand by time of day

Figure 6 (
a) and6(b)  show that SA-Net produces smaller MAE and MAPE for all times of day than Conv-LSTM Net for the black communities.Figure6(d) and 6(e) show that compared with Conv-LSTM Net, SA-Net produces smaller MAE and MAPE in the morning period (6 AM -10 AM), but produces slightly larger MAE and MAPE in the evening peak period (6 PM -8 PM) for the non-black communities.

Figure 6 (
Figure 6(c) shows the MPE for the black communities.The MPEs are consistently positive, indicating that both Conv-LSTM Net and SA-Net underpredict the travel demand for the black communities at different times of day.However, SA-Net consistently gives smaller MPEs than Conv-LSTM Net, showing that the former model reduces the magnitude of the underprediction of the black communities' travel demand.The MPEs for the non-black communities with Conv-LSTM Net and SA-Net are more similar at different times of day as shown in Figure 6(f)

FIGURE 7 :
FIGURE 7 : Performance measures by model and sensitive variable, corresponding to Table2 and Table 3

FIGURE 8 :
FIGURE 8 : MPE for different racial groups with different mitigation weights (γ) by time of day

FIGURE 9 :
FIGURE 9 : Spatial distributions of mean percentage errors using SA-Net for different bias mitigation strategies

TABLE 1 :
Accuracy comparisons among different models

TABLE 2 :
Fairness and accuracy comparisons with bias mitigation for race

TABLE 3 :
Fairness and accuracy comparisons with bias mitigation for income