Processing math: 100%
A Systematic Literature Review on Machine Learning in Shared Mobility | IEEE Journals & Magazine | IEEE Xplore

A Systematic Literature Review on Machine Learning in Shared Mobility


Abstract:

Shared mobility has emerged as a sustainable alternative to both private transportation and traditional public transport, promising to reduce the number of private vehicl...Show More

Abstract:

Shared mobility has emerged as a sustainable alternative to both private transportation and traditional public transport, promising to reduce the number of private vehicles on roads while offering users greater flexibility. Today, urban areas are home to a myriad of innovative services, including car-sharing, ride-sharing, and micromobility solutions like moped-sharing, bike-sharing, and e-scooter-sharing. Given the intense competition and the inherent operational complexities of shared mobility systems, providers are increasingly seeking specialized decision-support methodologies to boost operational efficiency. While recent research indicates that advanced machine learning methods can tackle the intricate challenges in shared mobility management decisions, a thorough evaluation of existing research is essential to fully grasp its potential and pinpoint areas needing further exploration. This paper presents a systematic literature review that specifically targets the application of Machine Learning for decision-making in Shared Mobility Systems. Our review underscores that Machine Learning offers methodological solutions to specific management challenges crucial for the effective operation of Shared Mobility Systems. We delve into the methods and datasets employed, spotlight research trends, and pinpoint research gaps. Our findings culminate in a comprehensive framework of Machine Learning techniques designed to bolster managerial decision-making in addressing challenges specific to Shared Mobility across various levels.
Page(s): 870 - 899
Date of Publication: 21 November 2023
Electronic ISSN: 2687-7813

Funding Agency:


SECTION I.

Introduction

Over the past decades, shared mobility has proven to be a sustainable alternative to private mobility and established public transport. It promises to reduce the number of private vehicles on the road while offering users flexibility in mobility [1], [2]. Numerous new services have emerged in major cities, ranging from car-sharing, ride-sharing, moped-sharing, bike-sharing to e-scooter-sharing [3], [4]. In this context, Shared Mobility Systems (SMS) offer users short-term access to meet their mobility needs [5]. Despite initial low usage rates of SMS in the early 2010s [6] and a temporary slump in demand due to the Covid-19 outbreak in 2020 [7], the adoption of SMS has been steadily growing [8], [9]. This trend is fueled by ongoing shifts in consumer behaviors and expectations, driven by advancements in mobile information and communication technologies [8], [10], [11]. However, despite the promising outlook for SMS providers, operating such a business remains largely unprofitable in many cases [12], [13]. Service providers also grapple with municipal governance issues [14], [15] and mixed reactions from residents concerning right-of-way rules, public safety, parking, and liability [12], [16]. For instance, due to public discourse, Paris recently banned e-scooters from its streets [17]. Furthermore, competition is intensifying [18], especially in larger cities.1 SMS providers are vying for market share and must deliver an exceptional customer experience and efficient service operation to ensure long-term business success [8], [9].

SMS providers rely on expansive IT platforms to adeptly manage the vast number of passenger trips, vehicles, and drivers, which can range from thousands to tens of millions daily [19], [20]. Efficiently handling the data generated by myriad vehicles and docking stations across the service area is pivotal for the smooth operation of these platforms [21]. This encompasses tasks such as matching drivers with riders, route optimization, providing accurate wait time and fare estimates, tracking vehicle locations in high-demand zones, overseeing financial transactions, and handling customer support [22], [23]. Poor management of SMS can adversely impact both environmental and economic sustainability. For example, ineffective repositioning of e-scooters can lead to an oversupply in some regions and a shortage in others, escalating operational costs. This imbalance can also diminish the service’s appeal to customers, resulting in profit losses [24]. A decline in shared mobility’s allure can prompt a rise in private transportation, leading to increased emissions and congestion [25], [26]. Hence, to enhance the appeal and efficiency of SMS, providers must analyze service usage data to pinpoint high-demand areas and refine vehicle distribution and maintenance strategies [21].

Such Intelligent Transportation Systems have historically relied on classic Transport Engineering techniques to predict short-term and long-term traffic patterns [27]. These classic techniques, such as Origin-Destination matrices [28], have been oriented towards incorporating pre-specified patterns. However, with the advent of Machine Learning (ML) methods, there has been a paradigm shift in how transportation data can be analyzed and interpreted. Unlike the “white box” approach of traditional methods, ML offers a “black box” approach, leveraging algorithms to automatically detect patterns in data [29].

Recently, major players in the shared mobility domain, notably Uber and DiDi, have deeply integrated ML into their operations. DiDi leverages Reinforcement Learning (RL) for ride-hailing order dispatching [30], while Uber employs AI and ML across its services, from demand prediction to Estimated Time of Arrival (ETA) [31]. Their commitment to ML highlights its pivotal role in contemporary transportation, reinforcing the emphasis of this review on ML as a key methodology.

The focus of this review on the methodological considerations of ML is deliberate and informed by several compelling factors. Firstly, ML’s data-driven approach is particularly suited for the dynamic and fast-paced nature of shared mobility, enabling more adaptive solutions [32], [33]. Secondly, its capability to handle multi-dimensional challenges makes it ideal for the complex systems inherent in shared mobility [34], [35]. Thirdly, the growing adoption of ML in industry underscores its practical relevance and the urgency for academic exploration [36], [37]. Lastly, the methodological diversity within ML, including supervised, unsupervised, and reinforcement learning, provides a rich tapestry for research and application [38], [39]. This diversity, coupled with the demonstrated successes of industry leaders like Uber and DiDi, validates the focus of this review on ML. It aims to offer a comprehensive perspective that not only synthesizes current research but also identifies challenges and potential solutions in the rapidly evolving field of shared mobility.

The application of ML presents a myriad of advantages in addressing the challenges faced by SMS. ML can process the vast data generated by Shared Mobility Systems (SMS) operations and make predictions grounded in historical data [40]. Informed by these analyses, service providers can make decisions that either enhance the customer experience or boost operational efficiency [41]. For instance, predictive modeling algorithms can be employed to create models that forecast demand, discern usage patterns, and other pivotal factors for shared mobility service providers [42], [43]. Leveraging these predictions, providers can fine-tune their fleet size, route planning, and tailor communication and services to distinct user groups [44]. Moreover, (deep-) RL algorithms can be harnessed to dynamically optimize shared mobility services, learning from past experiences and adapting the system’s actions in real-time. Such adaptations can aid providers in elevating their service efficiency while curtailing costs [45]. In this review, we delve into ML — encompassing supervised, unsupervised, and reinforcement methodologies — as a solution to management challenges in shared mobility.

Understanding the benefits of integrating ML into operational processes for decision-making is vital [46]. From an organizational perspective, decision-making is the commitment to a chosen course of action [47]. It involves selecting actions that align with an organization’s goal and objectives [48]. This process includes decisions about resource allocation, objective setting, and determining the organization’s trajectory, often aiming to boost organizational performance [49]. Decisions impact not just operational efficiency but also broader societal concerns like urban congestion and environmental sustainability. Hence, informed and strategic decision-making is crucial for shared mobility providers’ success.

Thus, employing this technology can offer a competitive edge, paving the way for sustained business success [50]. However, current research on ML application in SMS is fragmented, with insights primarily found in individual publications addressing specific issues. While there are existing literature reviews on the use of these methodologies in SMS, they either narrowly focus on SMS or are overly broad in exploring the potential of ML for service providers (refer to chapter II). Consequently, there’s a noticeable gap in research regarding the holistic implementation of ML in the service delivery process of SMS providers.

The focus of this review on the methodological considerations of ML is deliberate and informed by several compelling factors. Firstly, ML’s data-driven approach is particularly suited for the dynamic and fast-paced nature of shared mobility, enabling more adaptive solutions [32], [33]. Secondly, its capability to handle multi-dimensional challenges makes it ideal for the complex systems inherent in shared mobility [34], [35]. Thirdly, the growing adoption of ML in industry underscores its practical relevance and the urgency for academic exploration [36], [37]. Lastly, the methodological diversity within ML, including supervised, unsupervised, and reinforcement learning, provides a rich tapestry for research and application [38], [39]. This diversity, coupled with the demonstrated successes of industry leaders like Uber and DiDi, validates the focus of this review on ML. It aims to offer a comprehensive perspective that not only synthesizes current research but also identifies challenges and potential solutions in the rapidly evolving field of shared mobility.

In this literature review, we offer an overview of ML applications in SMS to aid service providers in decision-making and daily operations. Our contributions to the research on ML application are twofold:

  1. We illuminate the challenges of applying ML in SMS by highlighting identified issues, available datasets, and methodologies used. We also pinpoint “blank spots” – essential service provider activities overlooked in current ML literature. This insight can steer future research by offering a roadmap to address existing challenges and gaps. It is also valuable for practitioners aiming to enhance their services and cut costs. Consequently, we suggest potential areas for future research.

  2. We consolidate the fragmented knowledge on ML application into a unified decision-support framework for service providers. This offers a clear perspective on ML’s diverse application areas and presents a holistic strategy for harnessing ML to bolster decision-making and boost SMS efficiency.

To elucidate our contributions, the subsequent Section II delves into related work. Section III outlines our methodological approach. Section IV delves into the application of ML in SMS. Section V offers an overview of the datasets employed in the publications we’ve examined. In Section VI, we organize our findings into a comprehensive conceptual framework designed to aid the decision-making of SMS providers. This is followed by a discussion of our results. Finally, Section VII concludes the paper and provides insights into potential future work.

SECTION II.

Related Work

There is an abundance of literature reviews on shared mobility from various perspectives: management [51], transportation [52], [53], [54], information systems [55], [56], and interdisciplinary [57], [58]. Similarly, reviews have focused on ML techniques and their applications in diverse contexts [59], [60], [61]. Despite the recognized potential of ML as a solution for various management challenges [46], [50], only a few reviews bridge the gap between shared mobility and ML.

Table 1 offers an overview of literature reviews that focus on SMS and the application of ML techniques. We define SMS as mobility services that enable multiple users to share a pool of vehicles, such as bikes, scooters, or cars, for their mobility needs. These systems operate via mobile apps and Internet connectivity, facilitating users to locate, book vehicles, track usage and location, and manage payments [62].

TABLE 1 Analysis and Differentiation of Related Reviews in the Fields of Shared Mobility and Machine Learning
Table 1- 
Analysis and Differentiation of Related Reviews in the Fields of Shared Mobility and Machine Learning

The mission of these systems is to provide convenient, efficient, and eco-friendly mobility solutions, while it is essential to recognize that commercial providers of SMS pursue their survival and profit maximization [63]. Hence, their operational and decision-making strategies are often tailored to maximize profitability, even if it sometimes comes at the expense of their mission like environmental sustainability [64]. However, the way to achieve the organizational goal can indeed lead to varied operational and decision-making strategies. For instance, one SMS might follow a cost-leadership strategy and offer generic vehicles for a low price, while a SMS with a strong environmental sustainability mission might follow a differentiation strategy and offer carbon neutral vehicles and charge a higher price [65].

Quite in common, SMS providers face various operational decisions, from market entry to fleet sizing, station localization, dispatching, and repositioning [66], [67]. The organizational decision-making can be categorized into three levels.

  1. Strategic Level: Focuses on long-term planning and goal-setting.

  2. Tactical Level: Pertains to short-term decisions that bolster the overarching strategy.

  3. Operational Level: Addresses day-to-day or real-time operations, emphasizing efficient resource utilization.

These levels collaboratively ensure the organization’s decisions are well-informed and effective. In line with this, [68] and [46] posit that ML can enhance organizational decision-making processes. Moreover, SMS can be broadly categorized into modes where users share a vehicle (e.g., car-sharing, bike-sharing) or a ride (e.g., ride-sharing) [62], [69].

Most reviews focus on one type of SMS, offering in-depth analyses of ML applications within specific sharing modes [70], [71], [72]. Only a few works, such as [40], [73], [74], consider both types but limit their analysis to specific levels of organizational decision-making. For vehicle-sharing systems, [70] focus on bike station locations at tactical and operational planning levels using operational research and spatial-temporal analysis. Complementing this, [75] discuss promising future ML techniques for bike-sharing based on a literature review. For car-sharing, [71] provide a conceptual decision-support framework, offering insights into various operational decisions. Conversely, for ride-sharing systems, [76] and [72] review ML applications for various management decisions.

Reviews like [40], [73], and [74] adopt a broader lens, considering a wider range of shared mobility services. For instance, [40] reviews ML techniques for various management decisions across different shared mobility modes, emphasizing system requirements, but only at the operational planning level.

In contrast, [73] focuses on traffic forecasting using ML based on data from ride-hailing, bike-sharing, and other sources. Reference [74] provides a broad overview of shared mobility trends, applications, and case studies, with a subset considering ML techniques.

In conclusion, while existing studies on ML and SMS often focus on specific aspects, our goal is to provide a comprehensive understanding of the literature, presenting a holistic review of ML techniques in SMS to assist decision-makers at all three levels of decision-making. This approach aims to bridge the current research gap and offer a more rounded perspective on the topic.

As SMS providers face common operational decisions as well as pursue profit maximization as goal, an integrated review of both types of SMS that considers a wide range of management decisions seems fruitful. Although decisions made from one SMS provider offering station based bikesharing services might not be easily transferred to a SMS provider offering e-scooter services. However, solution approaches might be adapted to ones own business, when in response to an environmental change or governmental regulation.

SECTION III.

Methodology

This study aims to provide a comprehensive overview of the application of ML methods in decision support for SMS. To achieve this objective, we conducted a systematic literature review following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) protocol [77], [78]. The literature on ML for SMS was sourced from academic peer-reviewed journals.

Our search strategy encompassed both current technologies and those under research, as reviewing the present landscape of shared mobility will offer insights into the evolution, adaptation, and application of ML in the industry. We adopted a three-stage methodological approach based on the PRISMA protocol:

  • Stage 1 (Planning):

    This stage involved defining research objectives, selecting keywords, and establishing inclusion and exclusion criteria. Our aim was to identify dominant AI technologies in shared mobility and understand the challenges and opportunities they present. The search phrase combined synonyms for ML methodologies with keywords related to the shared mobility domain (refer to Table 2). We adapted our search string to fit each database’s guidelines. Our search was limited to peer-reviewed journal articles, and we used Scopus and Web of Science for our search, focusing on articles published since 2012. The search, conducted in July 2022, yielded 8,984 results. After filtering out ineligible topics and duplicates, 2,144 articles remained. A pilot test on 40 randomly selected papers helped refine our extraction and coding process.

  • Stage 2 (Conducting the Review):

    At this stage, we screened titles and abstracts, retaining 584 articles relevant to our research objectives. We excluded articles that lacked a focus on ML or had no relation to shared mobility. Our inclusion criteria ensured articles were “explicit, reproducible, and without a priori assumptions” [79], [80]. A forward and backward reference search further refined our selection, resulting in 218 articles for in-depth analysis.

  • Stage 3 (Reporting):

    In this final stage, we analyzed the selected articles based on predefined categories and problem sets. This involved examining management-related challenges, applications across various mobility modes and sharing types, and the techniques and data employed. We also identified publicly available datasets and assessed their utilization in academic research.

TABLE 2 Keywords Used for Systematic Literature Review Focusing on ML Methodologies in Transport Modes
Table 2- 
Keywords Used for Systematic Literature Review Focusing on ML Methodologies in Transport Modes

Our primary criteria for inclusion were articles that specifically discussed the application of ML techniques in the context of SMS. We excluded studies that

  • Were not written in English.

  • Lacked empirical evidence or were purely theoretical without practical implications.

  • Focused solely on traditional transportation methods without any relevance to shared mobility.

  • Did not provide clear methodologies or results related to ML applications in SMS.

We are aware of potential bias within this research process like databases constraints or the temporal scope. We address these issues through a transparent review process as well as, e.g., relying on Scopus and Web of Science, both prominent databases. Moreover, we elaborate potential bias within our limitation section.

In summary, our systematic review, aligned with the PRISMA guidelines, offers a structured and comprehensive analysis of the literature on ML techniques in SMS. This approach ensures a thorough understanding of the current landscape and provides insights for future research and applications.

SECTION IV.

Results

In this section, we present a comprehensive exploration of the multifaceted dimensions of the use of machine learning in shared mobility services (SMS). Each subsection delves into specific aspects that are pivotal for the effective operation and management of SMS through the lens of machine learning applications.

For readers interested in a synthesized overview, we have developed a comprehensive framework in Figure 2 that is presented in the discussion section. This framework serves as a roadmap, highlighting the tasks performed and relationships between various subtopics, and can be referred to for a more holistic understanding of the content presented here.

FIGURE 1. - PRISMA Flow Chart.
FIGURE 1.

PRISMA Flow Chart.

FIGURE 2. - ML-Supported Decision-Making Framework.
FIGURE 2.

ML-Supported Decision-Making Framework.

User Analysis offers a deep dive into the motivations, preferences, and behaviors of SMS users. By understanding the user’s psyche, operators can tailor their services to better meet user needs and expectations, ensuring higher satisfaction and retention rates.

Demand Analysis focuses on the broader patterns of SMS usage. It examines the temporal and spatial dynamics of demand, providing insights into when and where services are most sought after. A significant aspect of this analysis is the use of white box models combined with extensive feature engineering. This approach not only captures the intricacies of traffic patterns but also offers a transparent and interpretable framework to explain and understand these patterns. Such insights are crucial for operators to anticipate high-demand areas and times, allowing for proactive resource allocation.

Dispatching delves into the strategies to ensure vehicles are optimally positioned to meet user needs. Within this subsection:

  • Demand Prediction is forward-looking, focusing on enhancing model performance by employing sophisticated neural network architectures to achieve accurate and reliable forecasts of future demand.

  • Repositioning addresses the challenge of uneven vehicle distribution, proposing strategies to relocate vehicles from low-demand to high-demand areas.

  • Matching explores the algorithms and techniques used to pair users with available vehicles, ensuring a swift and efficient connection between demand and supply.

  • Estimated Time of Arrival (ETA) delves into predictive models that provide users with accurate arrival times, enhancing user experience and trust in the service.

  • Pricing examines dynamic pricing strategies that can influence user behavior, potentially steering demand to off-peak times or underutilized areas.

Infrastructure Planning emphasizes the importance of strategic groundwork, especially when entering new markets. It discusses considerations ranging from station placements to fleet sizes, ensuring that SMS integrates seamlessly into urban environments and aligns with user behaviors and preferences.

Together, these subsections paint a holistic picture of the current state of machine learning applications in SMS research, offering invaluable insights for those at the forefront of shared mobility’s evolution.

A. User Analysis

Shared mobility user analysis aims to understand users’ attitudes, behaviors, and preferences, primarily using data from questionnaires and natural language. This understanding is pivotal for service providers. Analyzing user sentiment not only pinpoints areas for enhancement but also helps tailor services to user needs [81]. While analyzing existing customers can foster loyalty by catering to their specific needs [44], insights from social media discussions about the services can guide targeted customer acquisition strategies. Knowledge of current customer behavior and potential user preferences empowers providers to make informed decisions on pricing, marketing, and operations. In a competitive market, understanding the user base is key to outpacing rivals and ensuring sustained success. Given this, ML techniques are invaluable for processing and analyzing extensive, often unstructured, transactional or social media data. It is worth noting the distinction between user analysis and OD-based demand analysis due to their unique scopes and methods. Insights on user behavior from demand data are intentionally excluded in the user analysis. For a deeper dive into demand data analysis, see Section IV-B.

The existing literature advocating the use of ML techniques for SMS providers spans various sharing schemes, including dockless systems [82], [83], [85], [86] and hybrid models encompassing both docked and dockless systems [44], [84]. These studies encompass a range of SMS types, from ride-hailing and car-sharing to bike and e-scooter sharing. Notably, moped sharing remains underrepresented in the reviewed literature.

From our review, methodologies like sentiment analysis, topic modeling, and classification algorithms have been employed to dissect data from diverse sources, including social media, surveys, and online news platforms. For instance, [83] introduces a framework to profile ride-sharing users, aiming to enhance service quality by matching users with similar interests. Reference [82] delves into user sentiments about ride-hailing services, pinpointing areas like customer satisfaction. Echoing this sentiment-driven approach, both [85] and [84] spotlight trending topics and user perspectives from social media dialogues. Reference [86] investigates user inclinations towards e-scooters under varied scenarios, including the influence of the COVID-19 pandemic. Unique in its approach, [44] employs agglomerative hierarchical clustering to segment users based on OD data from a car-sharing provider.

In summarizing the reviewed studies, it is evident that they provide valuable insights into user preferences and sentiments, equipping SMS providers with the knowledge to enhance their services and boost customer satisfaction. However, a notable gap persists in comprehending the dynamic nature of customer sentiments and preferences. This void could be addressed by integrating public sentiment data, such as those from platforms like Twitter, with proprietary service provider data, offering a clearer picture of how public sentiment correlates with real-world service use. The temporal dynamics of user sentiments, especially in light of significant external events, present an intriguing area of study. It is also essential to understand the cross-cultural and regional nuances in user behaviors and preferences, suggesting the potential need for region-specific models. While current research taps into questionnaires and natural language data, integrating diverse data sources, such as geolocation or app usage statistics, could offer richer insights. Ethical considerations around privacy, security, and potential model biases become paramount as we delve deeper into user data. Investigating how service alterations, be it pricing adjustments or new vehicle introductions, influence user sentiment and behavior is crucial. Beyond traditional demographic markers, innovative user segmentation, perhaps based on mobility patterns or service usage frequency, could be enlightening. Establishing a feedback loop that rapidly reincorporates user analysis insights back into the service promises continuous improvement. Furthermore, there’s a pressing need for a more detailed, ML-aided study of SMS user satisfaction and loyalty. This research should consider the diverse service types, like car-sharing versus bike-sharing, and the varied user demographics, including age, gender, and socioeconomic background. Understanding the pillars of long-term user loyalty in shared mobility and the differing behaviors across various service types can guide service design and marketing strategies. Lastly, the impact of external factors, from urban planning decisions to environmental concerns, on user attitudes warrants exploration, providing a holistic view of the shared mobility landscape. Delving into these areas promises to reveal critical insights, paving the way for providers to optimize their services and cultivate lasting customer relationships.

B. Demand Analysis

Demand analysis is a data analysis technique used in the field of shared mobility to understand patterns of demand for transportation services in different locations at different times. The objective of demand analysis is to assess the demand for mobility services in various spatial and temporal regions, often incorporating additional data, which enables service providers to comprehend demand trends and to deploy their vehicles in a more targeted and effective manner [87], [88]. Unlike demand prediction models (Section IV-C1), demand analysis emphasizes the explainability of the models and their outcomes, as well as the impact of various features on the model’s results. For example, utilizing the outcomes of demand analysis, SMS providers can enhance their services or vehicle number in regions with high demand, while reducing services or re-allocating vehicles in areas with low demand, leading to optimized vehicle utilization, lower operational expenses, and enhanced customer service. Hence, demand analysis can be understood as a cross-sectional activity to explore data and make informed decisions for a wide range of management problems.

As shown in Table 4, demand analysis is applicable to all forms of shared mobility. However, compared to other types of shared mobility, research in the area of demand analysis has been more focused on ride-sharing, and, especially, docked bike-sharing. The most widely used format in demand analysis is converting origin-destination information from trip data into spatio-temporal data, which is then supplemented by additional data such as weather [42], [89], [90], [91], socio-economic [92], [93], [94], and environmental [10], [95], [96], to provide a more thorough understanding of demand patterns.

TABLE 3 Summary of Related Work on User Analysis by Mobility Type, Data Source, and Analysis Method
Table 3- 
Summary of Related Work on User Analysis by Mobility Type, Data Source, and Analysis Method
TABLE 4 Summary of Related Work on Demand Analysis by Mobility Type, Data Source, and Analysis Method
Table 4- 
Summary of Related Work on Demand Analysis by Mobility Type, Data Source, and Analysis Method

For instance, weather can impact the usage of shared mobility services as various weather conditions can either increase or decrease the likelihood of customers using certain transportation modes [91], [97], [98]. Population density and income levels, which are part of socio-economic data, can also affect demand as regions with higher population density and income tend to have a higher demand for shared mobility services, seen as a convenient and economical alternative to personal vehicle ownership [10]. Environmental factors, such as air quality, traffic congestion, and infrastructure [99], [100], can act as a substitute for individual transportation through the utilization of shared mobility.

To elaborate further, for instance, [89] used binary probit models to identify different usage types – e.g., round trips or bike substitution – and capture the conditions that introduce systematic variations in docked bike-sharing travel behavior. In the study of [101], a demand analysis model for a docked bike-sharing system was employed to detect spatial pattern of rental stations. Further, the authors leveraged their findings and applied retail location theory to examine potential locations for future installation of public bike rental stations. In the study by [102], k -means clustering was used for spatial and temporal aggregation to examine the spatial distribution of origins and destinations of ride-sharing trajectory data for improved matching efficiency. Based on the results, the authors employed a geographically weighted regression model to evaluate the impact of surge pricing – a temporary hike in prices during high demand periods – on driver’s decision-making. Reference [42] investigate the differences between two forms of micro-mobility: dockless e-scooter-sharing and docked bike-sharing. To achieve this, the authors analyze hourly number of trips and median duration of trips, taking into account various factors such as weather, gasoline prices, local events, day of week, and time of day. In [98], the relationship between the usage of shared e-scooters, dockless e-bikes, and docked bicycles and weather conditions, events and holidays were studied. The researchers employed Prais-Winsten and Negative Binomial regressions, as well as a Random Forest model, to address some of the distributional difficulties in the trip models. They discovered that local events, in particular, could have a significant effect on demand. Finally, the authors of [103] analyze riding behavior of dockless bike-sharing in two distinct study areas using multisource data and employ tree-based ensembles and partial dependence plots to uncover nonlinear effects.

The field of spatial-temporal data analysis in shared mobility encompasses a wide range of topics, exploring the impact of various factors on demand for transportation services. As evident from the results in Table 4, many publications lean towards more basic ML techniques, such as Generalized Linear Models (GLMs) or Random Forests, emphasizing the explainability of the models. The factors influencing demand vary significantly across different modes of transportation. For a deeper understanding of emerging modes like dockless e-scooters, comprehensive analyses comparing different providers or geographical contexts would be invaluable. As shared mobility providers venture into new cities, it is intriguing to consider whether insights from one city’s demand analysis can be extrapolated to another. Moreover, the research landscape on user satisfaction and loyalty in shared mobility hasn’t fully explored the potential impact of the physical and social environment on user attitudes. While some studies have delved into the effects of service quality and pricing, there’s a noticeable gap in examining the influence of shared mobility facility design, the quality of the surrounding built environment, and the behavior of other users or prevailing social norms. These elements could significantly shape user perceptions of service quality, safety, and overall satisfaction. In this evolving landscape, machine learning offers untapped research avenues. Enhanced model interpretability, especially when combining enhanced models with techniques like SHAP or LIME, can provide both predictive accuracy and explainability. Cross-modal analysis using multi-task learning can reveal how demand dynamics in one mode influence another. Transfer learning can adapt models across cities, considering cultural and infrastructural nuances. As data collection scales, ensuring model fairness becomes crucial, making machine learning’s fairness algorithms indispensable. Anomaly detection can identify short-term demand spikes in data, for example enriched with event data, and feature importance can refine models to focus on impactful demand determinants.

C. Order Dispatching

In intelligent transportation technologies, order dispatching aims to reduce the supply-demand imbalance problem of transportation resources [124]. Large-scale IT platforms, for example Uber, Lime or DiDi, continuously receive ride requests in the order of thousands to tens of millions per day [19], [30]. Reasonable spatial and temporal distribution of traffic resources can help to maximize utilization, customer experience (e.g., waiting time, passenger’s distance, etc.), average idle time of vehicles and maximize resource usage (e.g., fuel usage) [125]. However, it is challenging to solve the order-dispatching problem for an optimal solution due to the stochastic, dynamic nature of supply and demand. Multi-objective considerations, system response time, and reliability increase complexity [30]. Therefore, for efficient responsiveness to demand in order to prevent the risk of supply-demand misalignment in dynamic systems, having accurate order dispatching is vital.

The remainder of this section is organized as follows: First, to aid in the planning, management, and regulation of transportation systems, as well as to enhance comprehension and recognition of the spatio-temporal patterns of urban traffic, literature on real-time demand prediction is being examined. Accurate predictions of urban traffic demand can lead to efficient matching and allocation of idle vehicle [126]. Second, in response to the inferred supply and demand changes, publications concerning balancing of supply as well as demand to avoid over- or under-utilization of assets, referred to as vehicle repositioning, is reviewed [127]. Third, for dynamic routing and allocation of vehicles to trips, literature dealing with ETA is considered. In contrast to other means of transport, for ride-sharing – for which passengers and drivers connect via a smartphone app – accurate pick-up and drop-off times through ETA predictions are essential for user experience, trip planning and efficiently matching of drivers and customers [128]. Fourth, we reviewed works dealing with the process of matching, which is necessary in ride-sharing to assign customer ride requests to drivers who are available. This is done by adhering to specific policies such as maximizing the driver’s earnings or minimizing the time the passenger has to wait [129]. Finally, dynamic pricing or incentive systems have been proposed in literature to influence customer behavior or increase the profit of sharing providers during peak times.

1) Demand Prediction

The development of demand prediction models in SMS aims to predict user supply and demand in a spatio-temporally fine-grained way in order to direct vehicles to areas of high demand before they arise, thus increasing utilization. The model forecasts short-term traffic trends for a time frame between five and sixty minutes in the future. Demand prediction models are different from traditional time series analysis, as they consider both spatial and external factors. For example, demand in one area can be affected by traffic in other areas, and external factors such as weather, events, and holidays can have an impact on demand throughout all regions. Despite significant research in traffic forecasting, spatio-temporal forecasting remains an area of ongoing study.

Table 5 indicates that traffic demand prediction is a problem that applies to all forms of shared transportation and sharing types. One area of particular interest is using Deep Learning (DL) to predict traffic demand using spatio-temporal data, which encompasses both spatial and temporal characteristics. Mostly, trip Origin-Destination (OD) data based on trip data or OD data enriched with weather, socio-economic or environmental data was used. Many studies have employed techniques such as Convolutional Neural Network (CNN) (e.g., [140], [141], [167], [178], [189], [190], [193]), Graph Convolutional Neural Network (GCN) (e.g., [146], [160], [178], [179], [194]), and Recurrent Neural Network (RNN) (e.g., [134], [145], [147], [150], [160], [176], [178], [195]) for this purpose.

TABLE 5 Summary of Related Work on Demand Prediction by Mobility Type, Data Source, and Analysis Method
Table 5- 
Summary of Related Work on Demand Prediction by Mobility Type, Data Source, and Analysis Method

A common way to represent spatio-temporal traffic data includes using images, where each pixel represents a certain area in the operational area [196], and researchers have focused on using CNN to identify both spatial and temporal patterns in the data. Additionally, time is divided into smaller sub-periods, and the number of pickups and drop-offs are accumulated for each grid cell within these sub-periods. The DeepST model, which is based on Fully Convolutional Network (FCN) architecture, was first introduced by [196].

This model laid the groundwork of OD based traffic prediction, and it has been improved in subsequent research by integrating techniques such as merging CNN and Long Short-Term Memory (LSTM) [197]. Further, [198] incorporated semantic views to capture regional demand correlations, [199] added 3D convolutions to account temporal factors, and [200] used hexagonal grids instead of square grids for well-defined neighborhoods, smaller edge area ratio and isotropy. Reference [141] utilized an ST-ResNet, a spatio-temporal CNN-based architecture, and combined it with optimization techniques to prevent empty cruising and decrease passenger wait time. A simple CNN-based model for predicting hourly demand at bike-sharing stations, incorporating weather data, was proposed in a paper by [189].

In addition, RNN – originally used in language modeling, text generation, and machine translation – improved prediction of OD demand. The framework ASTIR, an attentive spatio-temporal inception ResNet, was proposed in [150] for short-term ride-sharing demand prediction. Reference [195] used a convolutional autoencoder with LSTM and Gated Recurrent Unit (GRU) for e-scooter demand prediction, especially for scenarios with scarce data. Reference [167] presented DCAST, a spatio-temporal model with DenseNet and GRU based on attention mechanism for dockless bike-sharing demand prediction.

As an alternative to the grid-based representation of spatio-temporal data, graph representations, where intersections are represented as nodes and road segments as edges are prevalent. Several studies have employed Graph Neural Network (GNN)-based models for predicting request origin and destination [201]. Reference [202] introduced GraphSAGE, a framework that leverages attribute information to form node embeddings. However, this model only considers spatial dependencies and overlooks temporal patterns in the data. Reference [203] proposed a multi-task learning framework with grid embedding and LSTM, where the grid embedding models spatial dependencies and the LSTM captures temporal dependencies. Nonetheless, the model treats requests originating at node v and those destined for node v as the same, while they should be differentiated. To address the limitations, [201] proposed a representation learning model that captures both spatial and temporal dependencies through three types of neighbors: forward, backward, and geographical. This model takes the directed nature of requests into account and distinguishes between forward and backward neighbors based on request sequence. In a paper by [174], a GCN with data-driven graph filter was suggested for predicting hourly demand at bike-sharing stations based on hidden heterogeneous relationships between stations. The authors additionally utilized LSTM recurrent blocks to capture for temporal dependencies.

The DL methods mentioned so far are referred to as black-box models, which are challenging to interpret. Due to this, various authors have explored ways to use simpler models to understand the impact of individual factors on spatio-temporal demand. For example, the study of [171] utilized GLMs to examine the effect of weather and neighboring station conditions on predicting bike-sharing station demand. Reference [139] proposed a GLMs, specifically a random-effect negative binomial regression model, to predict short-term ride-sharing demands. The study also covers the impact of variables such as weather and socio-economic factors on demand modeling.

Research in the field of demand prediction for ride-hailing and ride-sharing is well established and these models can be adapted for use in other dockless systems. Trips, especially those made with e-scooters, are usually short. As users are unlikely to walk a substantial distance to access a vehicle, the proximity of vehicles to the user is substantial for service quality and makes vehicle relocation a crucial aspect. To accurately reflect the walking distance, smaller spatial units may be necessary compared to those used for ride-sharing. For bikes or e-scooters, it would be valuable to conduct studies on missed usage opportunities due to the lack of nearby vehicles. Additionally, incorporating socio-economic, demographic, or environmental data into demand prediction models can contribute to the optimization of electric vehicle charging infrastructure or parking spaces planning, reducing charging operation in high-demand areas or addressing parking-related socio-cultural issues.

2) Repositioning

Vehicle repositioning refers to algorithms by which operators adapt their management and operation strategies in response to dynamic changes in supply and demand. In order to achieve a balance between supply and demand, idle vehicles are reallocated in advance to areas with large demand gaps [125]. For operational purposes, for dockless SMS, the service area is discretized by dividing the area into zones. Further, the operator controls a fleet consisting of shareable vehicle with a capacity (e.g., for ride-sharing, the capacity is greater than one, otherwise one). The time frame for operation is typically divided into periods, whereas the rebalancing operation is run every time unit. Travelers with origins and destinations within the operational area send requests, queue up and are assigned to vehicles dynamically by a central dispatcher.

Most general, the analysis shows that relocation of vehicles is a problem of general relevance for service operators regardless the offered mobility type and sharing mode. Hence, ML techniques to support the relocation task are found for all types of shared mobility (i.e., ride-sharing or bike-sharing), with e-scooter-sharing being a meaningful exception. In general, most authors propose a solution based on deep-RL approaches (e.g., [153], [154], [218]), whereby especially older publications are limited to regression analyses for demand prediction combined with optimization approaches (e.g., [136], [206]). In dependence of the actual sharing type as well as sharing mode, the formulation of the relocation optimization models differ, so that for dockless SMS, it is the state-of-the-art approach in literature and practice to use grid-based environments to obtain a spatial discretization of the operating area into zones (see, e.g., [143], [153], [154], [211], [212], [213], [218]) – the counterpart of stations in docked SMS (see, e.g., [173], [216], [217]). Regarding time, the considered time frame is discretized into periods for both dockless and docked SMS. Further, dynamics in system environments in the context of intelligent transportation systems and urban planning are often represented by models [206], [207]. However, with the availability of large datasets [219], [220], environments’ complex external and spatial effects, and with the developments of Deep-RL, model-free approaches surged in popularity [221]. Models based on Deep-RL provide a means to learn system dynamics utilizing rich function approximations to represent the environment in a low-dimension [222]. Studies under consideration have also traced this development, i.e., while early studies use discrete event models, rule-based methods or RL [206], [207], [215], previous studies [143], [154], [204], [208], [214], [218] adapted model-free approaches as effective means of learning environment dynamics. Relating to the datasets used in literature, all papers analyzed are based on OD datasets.

Meaningful extensions of the basis problem are, for instance, adaptive model-free RL approaches to adjust to diurnal pattern in highly dynamic environments [154], joined passenger-goods transfer [143], refueling/recharging in operation [205], [214], and research of incentives for resource rebalancing in SMS [215].

The table in 6 underscores a significant gap in the current literature: there’s limited research on bike-sharing and an apparent void concerning dockless e-scooter-sharing. The intricacies of various mobility modes, such as ride-sharing or docked car-sharing, differ substantially from those of e-scooters or bike-sharing. Notably, the latter modes cover shorter distances, don’t employ drivers for relocation, and necessitate fewer relocation operations—typically just once a day, in stark contrast to ride-sharing. Given these distinctions, there’s a pressing need to develop machine learning-driven repositioning strategies tailored to the unique operational challenges of e-scooters and bikes. Deep-RL has emerged as a favored solution in recent studies, but there’s potential to delve deeper. Advanced RL techniques that can dynamically adapt to diurnal patterns, ensuring effective repositioning throughout varying demand periods, are yet to be fully explored. The increasing interplay between different transport modes in urban environments also calls for research into multi-modal repositioning. By harnessing models that optimize repositioning across diverse transport mediums, we can craft strategies that are not only efficient but also user-centric. Furthermore, as urban environments evolve, there’s an opportunity to integrate user feedback into these algorithms. This could lead to repositioning strategies that are more in tune with user needs and preferences. Another promising avenue is the exploration of whether encouraging customers to relocate vehicles themselves could mitigate challenges like improper parking in high-traffic areas. As machine learning models grow in complexity, ensuring they operate ethically and without biases becomes paramount. Future research could focus on creating repositioning algorithms that offer equitable solutions, avoiding inadvertent favoritism towards specific areas or demographics.

3) Matching

The matching problem [30], [223], [224] – also known as trip-vehicle assignment [225], and on-demand taxi dispatching [226] – is an online bipartite matching problem where both supply and demand are dynamic, with uncertainty arising from demand arrivals, travel times, and the entrance-exit behavior of drivers. Matching can be done continuously in a streaming manner or at fixed review windows (i.e., batching). Advanced matching algorithms frequently use demand prediction in some form beyond actual requests, such as the value function in RL. Further, the matching involves assigning customer requests to available drivers according to certain policies, such as increasing driver earnings or reducing passenger wait time [129]. More specifically, by analyzing spatial-temporal patterns and the hierarchical nature of the data, order matching models aid in predicting the likelihood of matching a passenger’s trip request with a driver from separate lists of orders and drivers. The consideration of matching in literature is limited to ride-sharing, which reflects the nature of SMS where users share a trip instead of a vehicle.

It is important to note that the matching process is restricted to ride-hailing or ride-sharing services, as it involves finding matches between drivers and customers, or between multiple customers. The general problem of matching ride-sharing requests and available driver, as well as its generalized forms, have been extensively studied in the field of operations research [224], [236], [237]. Further, there is a general shift in sequence matching research, with a move away from using combinatorial optimization methods to Deep-RL methods. Reference [227] examined the joint decision challenge of matching orders and scheduling vehicles by formulating a ride-hailing service as a problem of large-scale parallel sorting. They proposed a multi-agent RL model, where each region is treated as a single agent. Hierarchical RL based on the geographical hierarchy of regions was used to coordinate agents from different regions for long-term benefits. The matching efficiency was significantly improved by adaptively adjusting the order matching interval, particularly for delay matching. Reference [228] created a RL framework to determine the most effective delayed matching strategy, addressing the challenges of high dimensionality and sparse rewards. Reference [218] introduced a dynamic order matching and route planning model that takes into account real-time ride-hailing demand, pricing, and vehicle location to create planned routes. The model also allows drivers to offer different prices on the expected earnings from a particular trip and future destinations during the decision-making process.

The works of [229], [231], [232], [233] devoted their research to determining the likelihood of successful matchings using binary classification. In addition, in [231], [232] and [233], the importance of individual features of the model are consider to derive further knowledge about matching success. The aforementioned studies primarily focus on the interpretability of the models, allowing for an understanding of the factors that influence the probability of a match. As a result, simple models such as decision trees are commonly used. Reference [234] employed frequently used metaheuristics – algorithms that often have numerous parameters, which need to be adjusted to attain optimal performance. Therefore, they additionally proposed a neural network to predict the parameter values that work best for a specific problem instance, and then combining it with a large neighborhood search. An algorithm is presented in [159] that utilizes trajectory data to identify common movement patterns, and then uses this information, along with passenger information and predicted destination distribution, to match passengers in a way that minimizes their travel distance. Finally, in [235], an algorithm is presented that aims to reduce the burden on drivers participating in ride-sharing by considering their likely route and proposing a dynamic meeting point that is most efficient for the driver’s route while also being accessible for the passenger.

It is evident that, coming from the field of DL, deep-RL is solely used to solve the bipartite matching problem [218], [227], [228]. The determination of the match probability is formulated as a binary classification problem, which makes all classification-related methods from the field of ML applicable [229], [231], [232], [233]. Additionally, there are sophisticated methods that utilize a combination of DL and optimization algorithms to find matches that deviate from the traditional definition of bipartite matchings [159], [235]. As autonomous vehicles become more prevalent, there’s potential to explore matching algorithms that consider the unique dynamics of self-driving vehicles in shared mobility (e.g., matching of autonomous e-scooters). Further, with the rise of multi-modal transportation (e.g., combining e-scooters, bikes, and ride-sharing), research could explore algorithms that provide optimal matches across different modes of transport.

4) Estimated Time of Arrival

Forecasting travel time is considered a crucial service in intelligent transportation systems, which greatly supports route planning, ride-sharing, navigation applications and effective traffic management. ETA is widely used in location-based applications and is a vital service in these applications. However, producing accurate estimates is still challenging as understanding the impact of various dynamic factors – such as urban flows, traffic congestions, peak hours, and special situations like holidays and events – on travel time is a complex task [238], [239].

Generally, determining the estimated time of arrival is relevant for all types of sharing. However, in literature, travel times are almost exclusively considered for ride-sharing. There is a shortage of regression models for predicting usage times for e-scooters or bike sharing systems. In addition, both origin-destination data and enhanced origin-destination data, as well as trajectories, have been employed for forecasting travel times.

When it comes to methods for ETA, they can be divided into two categories: route-based and OD-based. In the case of route-based methods, [247] suggested using a gradient-boosting regression tree model to improve the prediction accuracy and explainability of freeway travel time analysis and modeling. In [245], the relationship between consecutive road segments and the travel time for each segment is viewed as time-series data. The authors apply LSTM models for predicting the sequence and use a spatio-temporal hidden Markov method to identify the correlation among different traffic time series and subsequently predict travel time. In [246], a framework is presented which uses GCN, RNN for basic travel time estimation for each road segment and a graph attention network to consider the relation to adjacent road segments. Additionally, a multitask learning model is employed to predict the travel time for the entire path and for each individual road segment.

Additionally, several OD-based methods have been proposed, one of which is the multi-task representation learning model presented in [241] which showed promising results. However, this method is computationally demanding and requires a large amount of data. Recently, an ensemble technique using multi-modality data was proposed in [240], where a gradient-boosting decision tree and a deep neural network were adopted. In [243], a DL model was proposed, which predicts travel time by combining a feed-forward network and self-attention. This model focuses on spatial dependencies while ignoring temporal correlations. Additionally, [242] used CNN and GNN for spatial and temporal correlations in traffic speed prediction. The proposed model applies a dilated convolutional network architecture to take advantage of the dilation rate by increasing the covered spaces between inputs. Furthermore, [244] introduced a model with self-convolution attention integrated with a temporal convolutional network to capture spatial correlations along with temporal dependencies. They additionally adopted multi-head attention to learn attentional weights for spatial, temporal and external features and their contributions to the output. Moreover, in his dissertation, [248] investigated the prediction of travel times for bicycles. However, access to the dissertation is limited.

In summary, it is clear that models for predicting travel times have become increasingly more complex in recent years, in line with advancements in ML. While ETA is extensively researched for ride-sharing, there’s a gap in understanding ETA for other shared mobility services like dockless e-scooters or bike-sharing. In addition, research could delve into how urban infrastructure changes, like the addition of bike lanes or pedestrian zones, impact ETA predictions.

5) Pricing

Pricing is typically addressed using dynamic pricing, which adjusts trip prices in real-time based on fluctuations in demand and supply or uses prices as counteract of the imbalance of vehicle accessibility in SMS. The pricing module is a macro-level tool to achieve balance between supply and demand. Moreover, by incentivizing customers to relocate vehicles to more efficient locations instead of provider-based repositioning can greatly improve the profitability of shared mobility [249]. However, implementing a poorly designed dynamic pricing or rebalancing model can be detrimental, as it can increase costs without improving performance.

Essentially, the application of incentives or dynamic pricing is applicable to all forms of sharing, as long as controlling the customer’s behavior is a desired outcome. For instance, rebalancing operations for docked or dockless systems can be reduced by rewarding customers for returning vehicles to high-traffic areas. However, there is limited research on this topic, with most studies focusing on ride-sharing and using data from origin-destination [250], [251], [252], [254]. Dynamic pricing is often examined in conjunction with related issues such as matching and repositioning using Deep-RL methods [215], [250], [251], [252]. Simple and interpretable models are also used to analyze existing or alternative pricing systems [253], [254].

In [251], an RL agent is used to determine pricing for each OD-pair and repositioning/charging decisions for each electric vehicle in the fleet. In [250], an RL agent and simulations are used to analyze the impact of surge pricing on reducing marginalized zones and improving spatial equity in Seoul. In [252], RL is used to optimize spatio-temporal pricing decisions for hexagon cells, with the goal of maximizing profits by adjusting per-km rates for excess mileage and driver wages. The authors of [253] collected an extensive dataset of Uber data from madrid, Spain, and supplemented it with general time and weather data. Further, they established a general linear model to examine the influence of multiple variables on Uber’s pricing and contrasted the model coefficients with those of taxis to identify the varying temporal different competitiveness.

In [215], the authors address the challenge of identifying the optimal incentivization strategy to maximize service levels while adhering to a budget constraint in a docked sharing services like bike-sharing or car-sharing. To study this, they used a simulated spatio-temporal bike-sharing system, and compared various RL algorithms. Another study, [254] examines the potential impact of unlimited usage pricing plans on bike-sharing revenue by using open trip data, designing new pricing plans, and evaluating the results using a pass choice model.

In conclusion, there exists a notable research gap concerning dynamic pricing and the effective deployment of incentives. While industry giants such as Uber, Lyft, and Didi have seamlessly integrated dynamic pricing into their business models [255], other shared service providers stand to gain from embracing similar strategies. Such approaches can not only mitigate challenges like improper parking but also diminish the necessity for resource-intensive repositioning by providers. As the adoption of dynamic pricing widens, it’s imperative for research to probe its ethical dimensions, safeguarding against unintentional biases that could marginalize specific user demographics. Beyond the technicalities of dynamic pricing algorithms, there’s a burgeoning opportunity to delve into the human side of the equation: How do consumers respond to price volatility, and can these responses be effectively anticipated and modeled? Lastly, the realm of research should extend beyond mere pricing, investigating incentive mechanisms that champion eco-friendly practices. This includes endorsing the use of e-scooters or bikes for shorter commutes over cars and advocating for travel during non-peak hours to alleviate traffic congestion.

D. Infrastructure Planning

Before venturing into a new market, strategic preparation is paramount for any enterprise. For a SMS provider, infrastructure planning stands out as a cornerstone of this preparation. It’s vital for service operators to grasp the nuances of customer behavior, such as their reasons for trips, chosen routes, and interactions with complementary services (like intermodal usage). Armed with this insight, operators can make informed decisions on station placements for docked fleet strategies or determine vehicle distribution for dockless models within their operational domain. Beyond just pinpointing locations, this understanding also aids in deducing the optimal fleet size, specifically the requisite number of vehicles.

As depicted in Table 10, research employing ML methods for infrastructure planning spans all shared mobility forms, encompassing both docked and dockless business models. Notably, the primary focus of this research has been on bike-sharing, followed by car-sharing, e-scooter-sharing, and moped-sharing. The primary data source for these analyses is OD trip data, especially when determining the number of vehicles [214], [256], [266]. Occasionally, this OD data is augmented with additional information such as land-use, infrastructure points-of-interest [90], [95], [192], [259], [260], [268], [272], or population statistics [261]. This enriched data proves especially valuable for studies on station locations, intermodal usage, and trip purposes.

TABLE 6 Summary of Related Work on Repositioning by Mobility Type, Data Source, and Analysis Method
Table 6- 
Summary of Related Work on Repositioning by Mobility Type, Data Source, and Analysis Method
TABLE 7 Summary of Related Work on Matching by Mobility Type, Data Source, and Analysis Method
Table 7- 
Summary of Related Work on Matching by Mobility Type, Data Source, and Analysis Method
TABLE 8 Summary of Related Work on Estimated Time of Arrival by Mobility Type, Data Source, and Analysis Method
Table 8- 
Summary of Related Work on Estimated Time of Arrival by Mobility Type, Data Source, and Analysis Method
TABLE 9 Summary of Related Work on Dynamic Pricing by Mobility Type, Data Source, and Analysis Method
Table 9- 
Summary of Related Work on Dynamic Pricing by Mobility Type, Data Source, and Analysis Method
TABLE 10 Summary of Related Work on Infrastructure Planning by Mobility Type, Data Source, and Analysis Method
Table 10- 
Summary of Related Work on Infrastructure Planning by Mobility Type, Data Source, and Analysis Method
TABLE 11 Summary of Open Datasets for SMS-Related Research
Table 11- 
Summary of Open Datasets for SMS-Related Research

Moreover, comprehensive trip details, including trajectories combined with map data [269], [270] or infrastructure points-of-interest [264], have been utilized for route choice and station location research. Beyond OD and trajectory data, one study employed a questionnaire to ascertain the trip purpose of customers [86].

The predominant approach in infrastructure planning research leans towards basic ML methods, emphasizing explorative and descriptive analyses. Techniques like generalized linear models and clustering are recurrent themes. For instance, [261] assessed the accessibility distribution of e-scooters and shared bikes among different population groups using generalized linear models. On the other hand, [271] advocated for clustering algorithms to pinpoint e-scooter station locations, aiming to transition from sidewalk parking to designated stations.

While initial infrastructure planning is a prerequisite before entering a market, it’s not a one-off task. It demands continuous reassessment, validation, and updates. This iterative nature ties infrastructure planning closely to spatial-temporal analysis, demand prediction, and rebalancing. Advanced ML techniques come into play here. For instance, [135] introduced a dynamic demand forecasting model to open or close stations, taking into account potential overlaps between stations. Similarly, [192] proposed a spatio-temporal graph capsule neural network to pinpoint deployment zones within e-scooter operational areas, optimizing operational efficiency. Reference [214] leveraged deep-RL to curtail operational costs by fine-tuning fleet size in relation to vehicle relocations.

Regarding intermodal usage, existing studies primarily focus on the synergy between public transit and bike or car-sharing services [194], [257], [258], [262], [267], with limited exploration of interactions between other travel modes. A significant critique of dockless models is the excessive number of vehicles cluttering public spaces. Proposals to address this issue include optimized station locations for dockless vehicles [265], [271]. An intriguing observation by [256] suggests that, counterintuitively, expanding the fleet size for car-sharing can boost service acceptance, potentially leading to increased profits.

A notable gap in infrastructure planning research is the heavy reliance on OD data, which restricts the depth of analysis. This might stem from the challenges in accessing more detailed but sensitive trajectory data. Additionally, many studies present localized findings using basic ML models without cross-validation across different regions. There’s also a discernible absence of literature employing RL to optimize station locations and vehicle numbers in SMS. By harnessing RL, providers can analyze historical usage patterns to determine optimal station locations and adjust fleet sizes based on demand, ensuring efficient use of shared mobility vehicles and minimizing public space congestion.

SECTION V.

Mobility Datasets

The vast majority of publications analyzed make use of an OD-dataset as the foundation for their analysis. Thereby, OD data is analyzed for all management decision support purposes. For example, [44] segments different user types based on OD data as part of the user analysis. Moreover, OD data is widely used for demand prediction, vehicle repositioning, matching, estimated time of arrival and infrastructure planning as well as it is commonly used for pricing analysis. Despite the general usage of OD data, OD datasets are frequently extended through additional datasets, which we refer to as OD+. For example, [137], [273] and [162] predict demand for shared mobility service based on OD data, while other enhances the demand prediction with the additional usage of historical weather data [166], [274], public transportation data [149], [173], map data [163] or census data [130]. Hence, depending on the actual management problem, OD data is prone to be extended through additional information to enrich the explanatory power of the analysis.

In general, datasets containing trajectory information, social media posts, or simulation data are less frequently used, but they are often used for specific management decisions. For example, synthetic OD data was used to support pricing decisions [215].

Synthetic data can be used to simulate datasets that mimic real-world data by applying statistical and ML techniques [283]. By using synthetic data, researchers and practitioners can access larger and more diverse datasets that may not be available in real-world, while also protecting the privacy and security of sensitive data. In addition, synthetic data can be generated using models trained on real data from cities where providers are already operating in, and these models can be used to simulate the behavior of shared mobility users in new cities. In our literature analysis, we found the works of [284] and [285] which uses generative adversarial networks to generated synthetic trip data.

In general, simulation data can be applied to all the problems under examination. However, generating simulation data requires additional effort, which can be avoided by utilizing open data. Additionally, using real data is often considered more practical for actual use cases. The absence of papers using simulation data in their analysis may be attributed to the oversight of the substantial research area of multi-agent systems that apply, for instance, game theory methods, which were not considered in the analysis. Further, text corpora extracted from social media data is typically used for user analysis [82], [83]. Trajectory data, on the other hand, is used for a wider range of management decisions, such as demand prediction [161], [167], estimated time of arrival [246], or matching [159]. In these cases, neural networks are almost exclusively used to analyze the datasets.

It is noteworthy to mention, that large-scale datasets containing trip data are often part of open data (e.g., [219], [220]), mainly to gain insights into transportation patterns and usage, which can be beneficial for urban planning and transportation management. However, trajectory data, which includes detailed information about an individual’s movements over time, can potentially reveal sensitive information, such as place of residence, which raises concerns about data protection. Therefore, trajectory data is often kept confidential and not made publicly available as part of open data initiatives.

Nonetheless, data protection concerns are also relevant for OD data. In order to protect the privacy of individual customers, for instance, the NYC TLC and Austin shared mobility datasets have recently been aggregated into larger geographic areas. The aggregation of the data reduces the level of precision, thus making it less likely to be associated with specific individuals.

Regarding the source of mobility data, the majority of studies utilize private datasets. However, a small number of papers make use of publicly available datasets, which are summarized in Table 8. The most commonly used open datasets are from the USA, specifically the NYC datasets that contain bike-sharing or ride-hailing data, which is used 34 times (for example, in the works of [150], [167], [182]). Other datasets that are used include the ride-hailing Chicago dataset, used 4 times (in the works of [105], [135], [231], [286]), the e-scooter Austin dataset, used 3 times (e.g., in [99], [261], [287]), and the ride-hailing Washington DC dataset, used 7 times (for example, in the works of [42], [272], [288]). Despite the frequent analysis of open datasets, Twitter, as a source for text-based datasets, is often used for user analysis (for example, in the works of [82], [84], [85]).

We noticed missing benchmark datasets – an essential instrument for guiding computer science research efforts and assessing algorithm performance. With an increasing number of publications focusing on ML algorithms’ application to transportation research, benchmarking is a systematic way of tracking progress. The considered studies effectively used several datasets for travel demands based on OD or synthetic data to develop and test novel algorithms, but a lack of standardization increases the likelihood for overfitting, unstable and potentially not replicable ML models. Further, in certain articles, the data acquisition method was not apparent, which is especially valid for articles which used webscraping. Webscraping can be a valuable means of obtaining distinctive datasets that may be challenging to obtain otherwise. Nonetheless, researchers must consider the reliability of the data as well as legal and ethical concerns [289]. Finally, the distribution of datasets is highly imbalanced, as most studies utilized OD data, while almost none of these studies used user-related data that could provide more detailed information about the users in conjunction with trip data.

SECTION VI.

Discussion

A. Discussion

Our research has illuminated the diverse applications of ML in SMS, aiding service providers in their decision-making processes. We underscore the role of ML as a methodological solution tailored to address specific management challenges essential for the efficient and effective operation of shared mobility services. Viewing the operation of a shared mobility service as a business process, the objective becomes fulfilling user demand, necessitating critical decisions by service providers.

Drawing from the three-tiered structure of organizational decision-making (strategic, tactical, operational), we align our findings from the literature review regarding the application of ML in SMS to an organizational decision-making framework (see Figure 2). This framework aims to position a shared mobility service as a compelling value proposition for potential users, echoing similar approaches found in [71] and [67]. Through this lens, we not only emphasize the potential applications of ML within a comprehensive organizational decision-making framework but also pinpoint critical activities where ML remains underutilized. Our framework, as illustrated in Figure 2, integrates ML techniques to bolster managerial decision-making. It’s a confluence of insights from strategic decision-making literature, such as [49], [68], [290], and specialized business literature focusing on shared mobility service provision, including [22], [291], [292]. Central to our synthesis is the idea that a service provider orchestrates all pivotal activities for service provision at the operational level, ensuring they align with the user’s journey. This alignment aims to meet user demands and deliver exceptional value to customers, as emphasized by [293]. The horizontal axis of our framework represents the various decision-making levels, while the vertical axis enumerates the managerial activities essential for facilitating service use, along with the associated challenges in establishing these activities.

We distinguish between user activities, which arise during the utilization of the shared mobility service, and management activities, which are essential to facilitate this usage, as highlighted by [294]. Drawing from our literature review, we aligned these challenges with ML applications, showcased at the solution layer at the framework’s base. These applications present ML techniques tailored to aid decision-making for the respective activities. Horizontally, we’ve categorized management activities based on their time horizon into strategic, tactical, and operational levels. At the operational level, the service provider directly interacts with its users. Given that data is amassed and scrutinized across all levels, we’ve pinpointed a feedback layer vertically, symbolizing the perpetual information exchange between these decision-making tiers. This structure facilitates more refined, data-informed decisions. For instance, by analyzing operational data, a service provider might refine its tactical planning regarding station placements, especially if data indicates underutilized or congested stations.

At the strategic level, service providers must initially identify a target market and specific user groups. This aligns with challenges in strategic marketing, particularly the decision regarding the target market [295]. Consequently, providers must gather and analyze information about potential target locations and diverse user groups. Organizations typically adopt either a hotspot strategy (prioritizing key locations) or a big spender strategy (targeting the most lucrative user groups) [296]. ML techniques, like natural language processing models, can guide service providers in pinpointing markets (both in terms of location and user groups) that are ripe for entry [85]. Furthermore, once a market is penetrated, these models can assist in understanding existing customers, allowing providers to customize their services to meet specific needs, thereby enhancing satisfaction and long-term loyalty [44], [297].

Subsequent to the target market decision, service providers must delineate a market entry strategy. This involves navigating challenges such as government regulations, competitive landscapes, local infrastructure, and determining the mode of entry. In the realm of shared mobility, this translates to defining the business model (dockless or docked), pricing strategy, and operational domain. For instance, [42] utilized an ML approach to contrast dockless and docked e-scooter sharing models. The insights from this comparison can guide the choice of business model. However, such studies are often confined to specific locales, making their managerial implications most relevant for providers in those or similar areas. Notably, no research currently exists that extrapolates these findings to other locations based on specific parameters.

While there are ML applications aiding in business model decisions, we found no ML tools addressing broader issues like government regulations, competitive dynamics, local infrastructure, pricing models, or operational areas. More generally, research exists on using ML for market entry decisions outside the specific context of SMS. For instance, [298] suggest using ML for sentiment analysis, which can be integrated into managerial decision frameworks like the PESTEL model. Additionally, [299] introduced an innovative ML approach to assist platform providers in determining their overarching pricing strategy. In essence, while literature exists on leveraging ML for strategic planning, such studies were beyond the purview of our review.

At the tactical level, service providers must address three pivotal activities: business deployment, awareness creation, and preparation for customer interaction [67], [300]. For business deployment, providers face decisions regarding the precise number of vehicles to deploy, the placement of stations, or - contingent on local municipal regulations - the establishment of drop-off zones. Additionally, they must determine the distribution channel for their service, whether through a proprietary smartphone application, joining a multi-sided platform, or both.

For deployment decisions, service providers can leverage exploratory spatio-temporal ML analyses to guide choices about vehicle numbers and station locations based on static demand. When entering a new market, providers might initially rely on publicly available data (as seen in Table 8) to estimate vehicle numbers. As operations progress, they can then analyze collected data to refine their deployment strategy.

While there’s evidence of ML being used to optimize vehicle and station distribution, we found no studies that assist management in choosing a distribution channel. This gap might suggest that the choice of distribution channel hinges on factors like competitive dynamics, available opportunities, and organizational capabilities, which aren’t always quantifiable. However, looking beyond the scope of this review, [301] introduced a deep learning framework designed to forecast sales across various distribution channels, offering potential guidance for this managerial decision.

For creating awareness, service provider will need to specify their communication strategy according to their target group. For our review, we could not identify a paper which supports the management of shared mobility services to build and maintain their communication strategy. Again, literature outside the SMS provide insights, as [302]. To elaborate, we could not find SMS specific literature which advocates management in their decision based on ML for distribution channel and target group communication, but find literature outside this domain.

With the preparation of customer interaction, we relate to the activity of service providers which is prior to the actual usage of the service. This activity is almost already operational, as service providers needs to make a decision about the actual deployment of vehicles within the operational area to satisfy user demand and organizational return. This decision can be supported with spatio-temporal data mining in a way to predict demand, so that vehicles are allocated at the most likely pick-up station or meeting point.

At the operational level, we organized the activities of service providers along the customer journey of the usage of the service [22], [303]. At the pre-usage phase, the potential user needs to register to the provider’s service, needs to set the account and in transition to the usage phase, make a booking or reservation (i.e., book an available e-scooter or arrange a pick-up with a ride-hailing service). Correspondingly, service providers needs to provide software to access the service, provide customer support for the user and needs to provide (real-time) information about the location and status of the vehicles. The use phase starts with beginning the trip from the user, through the trip to the end of the trip. Therefore, providers needs to grant access to the vehicle and provide customer support, e.g., for critical incidents. In the post-usage phase the user provides the payment and updates the account, which requires a billing and feedback system from service providers.

Relating to the problems service providers has to consider, ML is predominantly applied for the dispatching process in the pre-use and use phase, which deals with problems including matching, dynamic pricing, scheduling, routing and estimated time of arrival. Moreover, ML is frequently applied to rebalance vehicles after the usage phase. In general, spatio-temporal data mining models or reinforcement learning is advocated in literature to support service providers. To elaborate, ML is widely suggested to be applied in the operative planning of service providers. However, we could identify problems where ML is not applied (yet), based on our literature review. In particular, application maintenance, access management, billing issues and fleet maintenance remain blank. The problems of access management and billing issues refer to the broader activity of the customer relationship management of a service provider. Outside the literature on SMS, there exists guidance of how to utilize ML for customer relationship management (e.g., [304] and [305]). Moreover, solving billing issues could relate to a critical touchpoint so that service providers assign human agents with this task. Finally, for a more efficient fleet maintenance, ML could be applied towards predictive maintenance based on vehicle data as proposed in [306]. To draw a conclusion about the blank spots of ML in SMS, all of the identified blank spots can be related as contextual blank spots in the literature related to the application of ML in SMS. Hence, in the extant literature about ML in SMS do exist contextual research gaps.

Our research has illuminated the diverse applications of ML in the realm of SMS, aiding service providers in their decision-making processes.

B. Managerial Implications

The primary aim of this paper is to offer actionable insights for SMS providers on how to effectively leverage ML in their decision-making process. These insights are drawn from an extensive analysis of existing literature focused on the application of ML in the realm of SMS. We emphasize ML as a powerful instrument for data analytics and mode formulation, thereby enhancing organizational decision-making, a point supported by existing research [46]. Organizations often grapple with scarce resources while striving to obtain specific objectives. ML can support organizational decision-making at all three levels in the context of SMS. In the domain of strategic planning, ML proves particularly valuable for identifying potential markets to enter and shaping customer relationships. On the tactical planning level, ML is notably effective for assisting with resource allocation, while on the operational planning level, ML excels in improving forecasting accuracy, enhancing efficiency, and reducing costs.

At the strategic planning stage, ML can play a pivotal role in analyzing both existing SMS users and potential users, which is a critical aspect of defining target markets and identifying target demographics. In the realm of market research, ML can be employed to collect and analyze data from social media in various markets, aiding in the estimation of the likelihood of successful market entry. Additionally, ML can be applied to scrutinize the current customer base, identifying trends in consumer behavior, usage patterns, and preferences. For example, it can identify habitual SMS usage for commuting or leisure, which holds significant implications for SMS providers when devising marketing strategies. For example, promotions like vouchers may not yield desired results within habitual user segments, as noted in marketing literature [307]. Beyond the strategic level, ML also offers managerial implications that can guide SMS providers in optimizing their services and cultivating lasting customer relationships. Understanding user attitudes, behaviors, and preferences is crucial for service providers. Sentiment analysis and other ML techniques can help tailor services to specific user needs, fostering customer loyalty. Insights from social media discussions can guide targeted customer acquisition strategies, allowing providers to make informed decisions on pricing, marketing, and operations. Moreover, there is a notable gap in understanding the dynamic nature of customer sentiments, suggesting the need for integrating public sentiment data with proprietary service provider data for a clearer picture of public sentiment and real-world service use. It should be emphasized that in these contexts, ML serves to augment the company’s market research capabilities.

For tactical and infrastructure planning, the comprehensive review highlights the substantial role that ML can play in aiding managerial decision-making in SMS. Focusing on a variety of shared vehicles such as cars, mopeds, bikes, and e-scooters, meticulous infrastructure planning is crucial both prior to market entry and during ongoing operations. ML technologies are invaluable tools for determining the most effective locations for stations and the optimal sizes of vehicle fleets. Utilizing historical OD data, these technologies enable data-driven optimization that aims to align the placement of stations and the size of fleets with consumer demand. This is particularly important given the resource constraints that are often a significant concern for SMS providers. ML assists management teams in devising strategies that strike a balance between meeting customer demand and managing limited resources effectively. Such a balanced approach can lead to the optimization of both sales and service frequency while keeping operational expenses in check. However, it’s important for decision-makers to be aware that the effectiveness of ML is largely dependent on the availability of historical OD data [308], [309], [310]. This could pose challenges for companies looking to enter new markets where such data is not readily available. Therefore, while ML is highly effective for optimizing operations in existing markets, its utility may be limited when considering expansion into new, uncharted markets. Beyond this, managers should also engage in continuous reassessment of infrastructure planning, as it is not a one-time activity but requires ongoing validation and updates. Advanced ML techniques, such as dynamic demand forecasting models, can be used for real-time adjustments to station locations and fleet sizes. Inter-modal synergy should also be considered; managers should look at the interactions between SMS and other modes of public transit to enhance overall efficiency and attractiveness. Vehicle clutter management is another critical aspect, especially for dockless models. Optimized station locations for dockless vehicles can alleviate the issue of cluttering public spaces. Interestingly, expanding the fleet size can paradoxically increase service acceptance and potentially lead to increased profits. Managers should also be cautious when generalizing findings from localized studies, as many are specific to one region and lack cross-validation. Ethical and privacy concerns are paramount, given that some of the data used may be sensitive. Advanced ML techniques like spatio-temporal graph capsule neural networks and deep RL can be employed for ongoing infrastructure planning to fine-tune fleet sizes and optimize operational efficiency. A user-centric approach is also advisable; understanding the trip purposes and preferences of the users can be ascertained through questionnaires or more advanced ML techniques, providing a more user-centric service. Finally, the research shows a primary focus on bike-sharing, followed by car-sharing and other forms. Managers should consider this when planning infrastructure for less-represented forms like moped-sharing.

Our review underscores the transformative potential of ML and RL in the operational planning of ride-sharing services. Traditional techniques like rule-based systems and heuristics have been the go-to solutions for tasks such as demand prediction and vehicle dispatching. However, ML algorithms offer dynamic, real-time solutions that can significantly enhance operational efficiency, while also potentially automate traditional marketing activities like pricing or promotions. To exemplify this, we will demonstrate how ML can aid in operational planning, using the context of ride-sharing as an illustration. Prior to the usage of a ride-sharing service, [218] propose an effective matching framework based on RL to handle customer requests, while [251] suggest RL for dynamically determining prices. Instead of relying on traditional matchmaking rules, such as matching a rider’s request with the nearest available driver, ML-based matchmaking allows for pairing a request with a driver who will also be dropping off another passenger nearby. Consequently, in the pre-usage phase, ML has the potential to partially automate tasks associated with driver-rider matching, potentially improving the quality and speed of matchmaking, as well as reducing search costs. During the usage phase, ML can be applied to forecast travel time, which supports route planning and navigation applications, and facilitates repositioning based on ETA (i.e., [244]). However, the actual repositioning activity in the post-usage phase is also made more efficient through RL frameworks (i.e., [213]).

Compared to traditional methods, RL offers the ability to dynamically fine-tune strategies based on real-time variables like vehicle positioning and traffic flow, as cited by [311]. It also excels in optimizing multiple objectives through intricate reward functions, such as reducing the distance covered during repositioning, as shown by [312]. Moreover, RL demonstrates superior scalability as operational demand grows, according to [313]. As ML algorithms assume more responsibilities, ethical considerations like algorithmic bias and equitable service delivery must also be addressed. In summary, ML and RL offer robust, scalable, and dynamic solutions for optimizing the dispatching process in ride-sharing services, promising both operational efficiency and customer satisfaction.

To summarize, ML offers the potential to drive service evolution in response to user requirements, thereby fostering increased user loyalty and satisfaction. ML can allocate resources effectively by analyzing historical demand and supply patterns, thus optimizing service delivery. Additionally, ML furnishes a decision-making framework that enables adaptability in dynamic environments and addresses evolving user needs. However, it’s important to acknowledge certain drawbacks of ML. These include the risk of biased decision-making when training data is skewed and the need for ongoing maintenance and monitoring to ensure the model’s performance remains reliable over time. Additionally, the complexity of ML systems may pose challenges in terms of interpretability and transparency, making it difficult to fully understand and explain the reasoning behind their decisions. It’s worth noting that ML may offer greater accuracy compared to other methods, but it often necessitates more computing time, making efficient hardware and infrastructure crucial for its successful implementation.

To conclude, for managerial decision-makers it is necessary to carefully weigh the benefits and challenges of implementing ML within their operations of SMS. While ML can offer improved service optimization, user satisfaction, and adaptability to changing conditions, it’s essential to be mindful of the associated drawbacks. Decision-makers should consider factors such as potential biases in data and the need for substantial computing resources. In essence, while ML offers powerful tools for enhancing decision-making, it should be integrated into an organization’s strategy with careful consideration of its benefits and limitations to maximize its potential impact.

SECTION VII.

Conclusion and Outlook

A. Limitations

This review, while focused on ML methods in shared mobility, presents several limitations. Firstly, the trade-off between the efficiency gains from ML and the significant resource allocation for its development and maintenance must be considered. Conventional methods might sometimes be adequate, rendering the advanced ML techniques unnecessary. Additionally, there might be ML approaches addressing non-domain specific issues that lie outside the scope of this review, and these might be explored in broader research areas. The articles reviewed were based on specific keywords and selected literature databases, potentially leading to the omission of relevant search terms or outlets. Our reliance on Scopus and Web of Science, while comprehensive, might not capture all relevant publications, especially those from recent journals or non-traditional sources. This limitation could lead to an inadvertent omission of pertinent studies. The criteria for inclusion and exclusion, though comprehensive, might have inadvertently filtered out some relevant studies, especially those that discuss ML applications in SMS indirectly. By focusing on articles published since 2012, foundational or seminal works before this period might have been overlooked. While our aim was to capture the most recent advancements, earlier contributions could offer valueable insights. The subjective nature of screening titles and abstracts introduces potential reviewer bias, and our limitation to English articles might exclude significant research in other languages.

B. Conclusion

In conclusion, our systematic literature review has identified various applications of ML in SMS to support service providers in their decision-making. We have demonstrated that ML can be used as a methodological solution for specific management problems that need to be considered in order to operate SMS. The analysis focused on the methods and data sets used, identified research trends, and highlighted research gaps. The results of our review were synthesized into a framework of ML techniques to support managerial decision-making at different levels. Therefore, we matched ML techniques with critical activities that enable the provision of the SMS. Through the use of ML, service providers can enhance their strategic, tactical, and operational decision-making to create an admirable value for its customer and improve their overall operational performance. Finally, our findings highlight the potential of ML in SMS for organizational decision-making, but we have also identified critical activities where ML has not been used yet, suggesting areas for future research.

C. Future Research

Based on our literature review, we recognized limitations in the extant literature that provide avenues for future research. Although we highlight future research avenues at the end of each finding chapter, we will aggregate this in the following.

In general, the studies reviewed offer valuable insights into important areas for service providers, such as user analysis, demand analysis, dispatching and infrastructure planning. However, e.g., research about user analysis so far mostly neglect private data from service providers, so that an incorporation of public data could lead to a determination of the impact of the public opinion on the actual usage of SMS. Furthermore, e.g., for demand prediction, vehicle repositioning and ETA, we recognize that most research focus on certain mobility types (i.e., ridesharing or ridehailing), while little or no research focus on mobility types where users share a vehicle. Therefore, we encourage future researcher to focus on these mobility types, taking the particularities of sharing a vehicle instead of a ride into account. In this vein, further research on demand analysis of SMS, could include newer modes, like e-scooters, to compare providers or different geographical locations, and consider how physical and social environment affect user attitudes and behavior. This could provide insights for enhancing user experiences and retaining customers. Regarding demand prediction, more research is needed to predict demand for dockless systems, especially for e-scooters on short trips. Existing research on demand prediction for ride-hailing and ride-sharing is well established, but incorporating socio-economic, demographic, or environmental data can improve models and contribute to optimize infrastructure planning.

Further research in shared mobility, especially for problems like short-term demand prediction, should prioritize establishing standardized benchmark datasets. This would ensure consistent evaluations across studies. A unified approach to design decisions and preprocessing methodologies is also essential. By standardizing evaluation metrics, we can achieve clearer comparisons of ML techniques, enhancing the clarity and impact of research in this domain.

The literature of repositioning shows that there is limited research on dockless e-scooter-sharing, and the needs of various mobility modes such as ride-sharing or docked car-sharing are markedly different from those of e-scooter or bike-sharing. Additionally, exploring ways to encourage customers to relocate vehicles could help to curb undesired behavior and improve the overall user experience. Regarding ETA, while models for predicting travel times have become increasingly more complex in recent years, there is still limited research on modeling the prediction of usage time for other shared mobility services such as dockless e-scooters, car-sharing or moped-sharing. In addition, for pricing, there is a lack of research on dynamic pricing and the utilization of incentives. Other sharing service providers could benefit from adopting dynamic pricing methods to address negative behaviors or reduce the need for repositioning efforts. Overall, further research in these areas could provide valuable insights for service providers seeking to improve their offerings and retain customers. Limitations in infrastructure planning research include the use of limited OD data and basic ML models with little validation across locations and providers. There is a lack of literature on the use of RL to optimize station locations and fleet size in SMS, which can be valuable for minimizing underused or overused areas and preventing overcrowding spaces. Based on our synthesis with the decision-making framework, from the extant literature about ML in SMS, service providers find ML solutions for already existing services, so that service providers could utilized the information retrieved from the operational business to enhance the service at the specific location. However, no ML based solutions exist to transfer insights from operated locations into other locations, e.g., based on parameters. Such an approach might advocate service providers to enter new markets that are particular suitable based on the service offering. Therefore, for future research we recommend to transfer and test the outcome on other locations based on parameters. Moreover, during our synthesis of our results towards a ML supported decision-making framework for service providers, we recognize gaps as for some critical tasks of the service providers no SMS specific ML solution exists. However, from literature outside the scope of our review, we could identify generalized insights. Although, these approaches might be not satisfying for the specific needs of SMS service providers, who have to deal with highly fragmented location-specific regulations and competitive landscapes. Therefore, we advocate future researcher to validate the generalized insights in the context of SMS.

ACKNOWLEDGMENT

The authors acknowledge the support provided by the Open Access Publishing Fund of Clausthal University of Technology.

References

References is not available for this document.