Fleet Management and Charging Scheduling for Shared Mobility-on-Demand System: A Systematic Review

Challenged by urbanization and increasing travel needs, existing transportation systems call for new mobility paradigms. In this article, we present the fleet management and charging scheduling of a shared mobility-on-demand system, whereby electric vehicle fleets are operated by a centralized platform to provide customers with mobility service. We provide a comprehensive review of system operation based on the operational objectives. The fleet scheduling strategies are categorized into four types: i) order dispatching, ii) order-dispatching and rebalancing, iii) order-dispatching, rebalancing and charging, and iv) extended. Specifically, we first identify mathematical modeling techniques implemented in the transportation network, then analyze and summarize the solution approaches including mathematical programming, reinforcement learning, and hybrid methods. The advantages and disadvantages of different models and solution approaches are compared. Finally, we present research outlook in various directions.


I. INTRODUCTION
W ITH the decarbonization trends in the transportation sector, electric vehicle (EV) becomes an important part in the road transportation system [1], [2]. Meanwhile, shared mobility-on-demand (MoD) system, such as Uber, Lyft, and Didi, can fulfill urban travel demand more efficiently compared with private vehicles. The utilization rate of vehicles, road, and parking facilities in shared transportation systems is much higher than that of private cars [3] and conventional taxi fleet [4]. For example, a shared MoD system could meet the mobility need of the same population with roughly 60% of the conventional taxi fleet [4] by adopting cloud-based fleet management and navigation platform. Therefore, the EV-based shared MoD system could play a more important role in the future urban transportation system. The shared MoD system is composed of three key components, which are platform, customer and driver. The platform connects customer requests and driver-owned vehicles, dramatically changing the transportation condition in real time [5]. Passengers submit the travel demand to the platform. The trip information includes pick-up and drop-off location, departure time and travel mode. Typically, there are two travel options for customers to choose: ride-sharing and ridesplitting [6]. To unify the statement, we define ride-sharing as the situation where passengers specify their departure and destination locations and wait for pick-up by the driver. In contrast, ride-splitting or car-pooling indicates that multiple passengers with similar routes and time-schedule share a single vehicle, thus fewer vehicles can satisfy more customer demand, which makes the fleet more affordable, sustainable and time-effective [7].
The proliferation of a shared MoD system relies on fleet management and charging scheduling schemes. There are multiple challenges, ranging from macroscope to VOLUME 9, 2022 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ microscope, shall be resolved to enable efficient fleet management and charging scheduling. We summarize three key operational challenges as follows.
(i) Efficiently match the travel request with available EVs.
(ii) Efficiently route the EV to a desire location either for serving its current order or preparing for future dispatch.
(iii) Choose the right time and location to recharge the EV. In general, the shared MoD system addresses those challenges from the perspective of the fleet platform operator. The fleet operator manages the MoD system by assigning passengers to EVs and routing them, rebalancing the idling fleet by relocating them to reduce the asymmetrical demand distribution, and scheduling the vehicle charging location and time [8]. These scheduling decisions are highly spatiallytemporally coupled with each other. Specifically, EVs are rebalanced to the high-demand area in advance to fulfill the order appearing in the future [9]. Meanwhile, vehicles are dispatched to the optimized charging station at the appropriate time and the charging demand profile varies in time and space sequentially. These fleet management strategies are coordinated by the operator centrally and simultaneously aiming at maximizing revenue or minimizing costs [10]. In this paper, we classify research into four categories based on the operational objective types.
The rest of this paper is organized as follows. The operation modeling of the shared MoD system is presented in Section II. The solving approach is discussed in Section III. The research outlook is discussed in Section IV and a summary of the paper is concluded in Section V.

II. PROBLEM MODELLING
We classify the modeling methods of the shared MoD system into four categories based on different operation objectives, which include 1) order dispatching, 2) order-dispatching and rebalancing, 3) order-dispatching, rebalancing and charging, and 4) other extended objectives, as shown in Fig. 1. The details are discussed in the following.

A. ORDER-DISPATCHING
In the order-dispatching problem, the shared MoD system optimizes the matching decisions in a dynamic way faced with time-varying demand and stochastic scenarios. Meanwhile, matching decisions in the current period will strongly affect demand and supply in subsequent periods. The platform should consider multiple short-term and long-term objectives, such as instant rewards from passenger pick-up time and fare charged, passenger and driver satisfaction and platform profit in the long-term, which may conflict with each other. In addition, since the matching process focuses on each vehicle and request, the scale of the problem could be huge, which causes the dimensionality curse. In real world, order matching is usually performed in real-time, which requires an improved solution time in the algorithm.
We can divide the order-dispatch problem in mobility service into two categories: matching and matching considering vehicle routing problem (VRP). For the former problem, there are three types of driver-passenger matching strategy:

1) PLATFORM MATCHING
The platform assigns the requests to vehicles centrally based on the vehicle distribution and travel demand. Under the situation where requests are processed as long as they occur, some follow the first-come-first-serve rule (or first-in-firstout, FIFO) [11], [12]. Passenger with earlier request receives priority in the response process and is assigned with the nearest available vehicle [11]. Other research utilizes the optimization model to formulate the problem [13], [14], [15], [16] based on a directed graph. Bipartite graph matching is implemented with the aim of minimizing maximal cost edge [14] or maximizing the system utility function, which is a combination of the total net profits of all taxis and the waiting time of passengers [15]. Meanwhile, the fluid-model and circle region-based model are proposed in [13] and [16], respectively.
However, compared with the situation when the system platform or operator responds to customer requests immediately, many ride-sharing platforms collect requests within a short time window and solve the problem at the end of each time window, called request batching [17], [18], [19], [20]. Queue-theory constructs the batching model in [17] and [18]. A queue aggregates the customer requests and the same number of vehicles are dispatched to the queue when it reaches a threshold size [17]. Bipartite graph models the batching riding as an integer programming with the aim of maximizing welfare for drivers and passengers [19]. [20] formulates the batch marching problem as a multi-objective optimization and develops an adaptive matching policy, which can achieve the target-based optimal solution.

2) ORDER-GRABBING
Different from the matching dominated by the platform, drivers in the order grabbing mode decentrally choose their orders provided by the platform [21], [22], [23]. In this mode, the behavior pattern of drivers can affect the matching results. The combination optimization in [21] optimizes the overall traffic efficiency and delivers the best user experience. [23] employs the multi-network flow model to obtain the sampling probability matrix of each vehicle with the aim of minimizing flow cost.

3) MUTUAL MATCHING
Since few studies have addressed the problem of passenger satisfaction from the perspective of passengers. If the problem of user satisfaction cannot be solved well, the platform will lose users and eventually lead to lower revenue. Therefore, [24] implements a hidden points-based bipartite graph to design matching and allocation mechanisms that significantly improve passenger satisfaction.
The latter problem which emphasizes routing strategy in the order-dispatch process is formulated as dynamic vehicle routing problems (DVRPs),part of the larger family of VRPs (Vehicle routing problems). VRPs are usually solved as static routing problems, whereby the origin and destination of trips are known in advance. However, customer demand in MoD systems is dynamic, leading to DVRPs. [25] presents a queueing approach to the task allocation and dynamic routing strategies of vehicles. [26], [27], [28] formulate the vehicle routing with pick-up and drop-off as a Markov dynamic process (MDP).
For the ride-splitting scenario, the policies need to provide assignments and routes dealing with multiple pick-up and drop-off locations and time window constraints. Graph-based modeling formulates the routing and matching strategy of each vehicle and request as a mixed-integer programming problem [29], [30], [31], [32]. [29] optimizes ride-splitting problem with the aim of maximizing the total profit while respecting the pick-up and drop-off time as well as maximum ride time.

B. ORDER-DISPATCHING AND REBALANCING
In addition to order dispatching, a critical operational objective focused by the MoD system operator is the repositioning of empty vehicles awaiting new passengers, which includes the vehicle routing process implicitly. Supply-and-demand mismatching challenges the shared MoD system and vehicle rebalancing appears as an efficient way to reduce the asymmetric demand in geography distribution. Empty vehicles drive for the high-demand area in advance to fulfill customer requests in time and reduce passenger waiting time. How to reposition the empty vehicles awaiting new passengers from a system-wide perspective is important to increase the system efficiency.
We categorize the related literature based on the modelling approaches. Basically, they can be divided into three groups: i) graph-based, ii) queueing theory-based and iii) grid-based. These three models are illustrated in Fig. 2.

1) GRAPH-BASED
For the graph-based models, the transportation network is often modeled as a directed graph including arcs and nodes. The node represents location such as a station or an area and the arc represents a combination of roads between two locations. Specifically, those graph-based models can be categorized into three major types of formulations, which are network flow, vehicle-centric and other techniques.

a: NETWORK FLOW FORMULATION
The vehicle fleet and passengers are modeled as flows and the fluid-dynamic approach is often adopted. Fleets and customers are not represented individually but as flows between nodes in this approach [33], [34], [35], [36], [37], [38], [39]. The main constraint is flow conservation and consistency, which requires the number of vehicles flowing in a certain node equal to that flowing out of this node at the same time. This modeling approach reduces the problem size, but the routes cannot be acquired for a specific vehicle directly. The problem could be divided into subproblems, rebalancing and order assignment with totally unimodularity [34], which extends the problem to large-scale and leads to a computationally-efficient scheduling algorithm for the vehicles.
Though vehicle repositioning reduces customer waiting time and increases passenger throughput, there is concern that shared vehicles result in congestion worse than personal vehicles due to empty repositioning. Thus, many papers consider endogenous congestion, which is affected by the operation of the shared MoD system. Congestion can be modeled by capacity constraints in the total traffic flow on the road [33], [36], [38]. The road-utilization-dependent travel times are captured via a piecewise affine approximation of the Bureau of Public Roads (BPR) model [38]. Within a capacitated transportation network, research shows that the rebalancing VOLUME 9, 2022 vehicles do not lead to an increase in congestion if properly coordinated under relatively mild assumptions [33].

b: VEHICLE-CENTRIC FORMULATION
Decision variables represent operational scheduling for a vehicle that will 1) wait at a node, 2) serve a customer, or 3) rebalance to another node [4]. In the simplest setting, routing of a vehicle is optimized via binary decision variables, taking value 1 if and only if the vehicle is assigned to the corresponding road link [40], [41].

c: OTHER FORMULATION TECHNIQUES
Traffic flow is represented by cell transmission model where each road is divided into an ordered set of cells, discrete spatial intervals that vehicles travel through in [42] and [43]. The sending flow and receiving flow are transition flows constrained by the kinematic theory. Different from the nodebased in the reference mentioned above, [44], [45] investigate a region-based model where the fleet operating regions are partitioned and discretized with demand estimation in ridesplitting mode. Taking [45] for example, in the across-region level, idle mileage induced by rebalancing vehicles is optimized and a robust dispatch strategy is designed. Within each region, pick-up and drop-off schedules for real-time requests are obtained for each vehicle with the objective of minimizing total mileage delay while serving as many requests as possible. [46] ignores the transportation network and only models AVs and customers.

2) QUEUEING THEORY-BASED
Queueing network is used to represent the critical performance metrics such as the availability of vehicles at stations and customer waiting time [4], [47], [48], [49], [50]. The road network is modeled as an abstract queueing network with infinite-server road queues when the road congestion is not considered. Queueing theory-based model formulates the matching process in [48]. [49] models the mobility-ondemand system as two coupled closed Jackson networks with passenger loss. [50] resolves the non-myopic idle vehicle relocation using queue delay as an approximation of the conditional expected cost under ride-splitting.
For the queueing modeling-based research, congestion is typically considered through capacity constraints on the queues [4], [47]. [4] proposes a queue network model with finite-sever within a Jackson network model. In [47], the MoD system is cast within the framework of closed, multiclass BCMP queuing networks. The framework captures stochastic passenger arrivals, vehicle routing on a road network, and congestion effects.

3) GRID-BASED
For the grid-based techniques, hexagonal grids are deployed to represent the transportation network and vehicle scheduling could be depicted as the following actions [51], [52]. The order-serving action picks up an available order from the platform and transports the passenger from the current location to the destination grid. The reposition action is moving to adjacent grids or wandering in the current grid. [52] realizes transfer between regions regarding order-dispatching and routing based on hexagonal grid modeling.

C. ORDER-DISPATCHING, REBALANCING, AND CHARGING
Based on Section II. B, vehicle energy-refueling is necessary after accomplishing travels. An intelligent fleet charging policy ensures that vehicles have adequate level of energy for future actions and virtually eliminates the ''range anxiety'' issue, which is a major barrier to EV adoption. Moreover, when vehicles are not adopted for fulfilling trip requests, they could be routed to charging stations in order to either absorb excess generated energy when power demand is low (G2V) or inject power in the power network when power demand is high (V2G). If charging scheduling is well-managed, it will not only benefit EV drivers with lower electricity costs, but also provide flexibility for grid operators to perform load balancing or renewable energy integration.
1) For the vehicle-centric modeling in the transportation network, the battery level of an individual vehicle and the energy availability in the power grid are accounted in [53]. The fleet charging/discharging and vehicle to grid (V2G) services are optimized on the energy layer with the aim of minimizing electricity cost over a long time scale [54], subject to vehicles' travel distances constraints. [7] proposes a joint rebalancing and V2G coordination, with the aim of vehicle utilization maximization.
2) For the network flow model, time-expanded network flow model is developed in [8], while considering road congestion and operational constraints in distribution and transmission power network. The charging characteristics of each node is introduced compared with that in the Section II. B, thus there are three kinds of vehicle flow in the transportation network: order-serving flow, rebalancing flow and charging flow. These flows satisfy the flow continuity and consistency constraints. That is to say, the number of flows leaving one node equal to that arriving at this node with the same charge level.
3) Agent-based model considers the vehicle charging potential to supply operating reserve in [55].
The articles which not consider the interaction between power and transportation network could be divided into three similar categories. [56], [57], [58], [59], [60], [61], [62]. a) For the vehicle-centric modeling, different time scales are considered to decide vehicle scheduling [57]. In [57], charging is optimized over longer time scales to minimize both approximate waiting time and electricity costs. Routing and relocation are optimized at shorter time scales to minimize waiting times, with the results of the long-time-scale optimization as charging constraints. b) For the network flow models, differential equations are utilized to model dynamic behavior of customers and vehicles [61]. The number of vehicles and customers at a node obeys the nonlinear timedelayed differential equations. In addition, the charging and routing problems could be decoupled under the assumption that electricity prices at the destination nodes of all current trips are unknown to the operator [59]. Electric traveling salesman with time windows is developed in [58] to solve customer routing and recharging with the aim of minimizing the total distance of the selected arcs and recharging paths. c) For the agent-based model, [56] predicts the battery range and charging infrastructure requirements of the EV fleet operating on Manhattan Island. [60] optimizes charging scheduling during rebalancing.
It is worth clarifying that there are some literatures considering order assignment and vehicle charging operation, while vehicle rebalancing is ignored. For the queue-based modeling, [63] formulates the dispatching problem as a stochastic queueing network and employs Lyapunov optimization technique, aiming at minimizing vehicles dispatch cost and customer waiting time. For the vehicle-centric modeling, [64] optimizes routing and charging strategies with given origin location, aiming at maximizing the energy efficiency.

D. EXTENDED OBJECTIVES
In addition to the system operation and fleet management, there are some extended operation objectives which account for the shared MoD system. We classify these works into four categories: intermodal, pricing, planning and battery swapping.

1) INTERMODAL
Operating a MoD system to cover the complete city-wide transportation demand would inexorably increase the number of operated vehicles and cause congestion again due to induced demand for transportation, as customers are shifted from public transit to shared vehicles. MoD system should intelligently cooperate with other modes of transportation, such as the public transportation network or private vehicle, in order to reduce the overall travel time and secure congestion-free urban mobility. Against this backdrop, some studies develop modeling and optimization methods to realize the benefits of the intermodal transportation system [65], [66], [67].
Multi-commodity network flow model is employed in [66] and [67] to capture the joint operation of MoD system and public transit, with the aim of reducing customers' travel time. Furthermore, the joint intermodal congestion-aware routing and rebalancing formulation of the vehicle fleet is extended to a mixed traffic setting capturing the interaction between MoD users and private vehicles in [67].

2) PRICING
Trip pricing policies play an important role as they modulate the inflow of customers traveling between regions in the network. As a result, the operator chooses prices such that the induced demand ensures a balanced load of customers and vehicles arriving at each location. Additionally, selecting prices enables the operator to modify demand such that the system can operate with smaller or larger fleet size [68], [69], [70], [71]. A joint dynamic pricing, dispatching, rebalancing strategies are optimized in [69] and [70].
From the perspective of charging network operator (CNO), optimized charging pricing guides vehicles to charge at approximate time and location considering electricity price purchased from grid, thus the charging station network could be optimized. [68] proposes a spatial-temporal charging pricing strategy to improve the operation efficiency of the integrated charging system and transportation.

3) PLANNING
Planning problems could be classified into fleet planning and charging infrastructure planning. Fleet planning optimizes the number of EV fleet and battery capacity of each class for heterogeneous fleet, and initial fleet distribution including charge level and vehicle location [72], [73]. Charging infrastructure planning determines charging station siting and the number of charging bays with different charging rates. The charging infrastructure planning requires optimization methods to consider the coupling between transportation and power network. The impact of vehicle charging behaviors on the fleet operation and charging system planning are effectively evaluated in the joint fleet sizing and charging system planning model [74].
Crucially, the operation of the shared MoD system will be strongly influenced by the available charging infrastructure, which in turn should be designed to accommodate the EVs' charging activities in the best possible way. Scheduling in operational problems is essential in planning problems so that the investment costs at the planning stage and the operation costs in the future could be balanced. At the operational level, scheduling strategies such as routing, dispatching, and charging are considered. Station siting and fleet operation could be jointly optimized and the expanded network flow model is employed in the transportation network [75].

4) BATTERY SWAPPING
Compared with the long charging time, the battery swapping method allows an EV to swap its depleted battery (DB) for a fully-charged battery (FB) at the battery swapping station (BSS) within several minutes. If battery swapping is adopted as an alternating energy refueling method in the shared MoD system, not only can it benefit the drivers with a fast energy refueling service, but also match available drivers with more demand during traveling time, thus the operation efficiency of fleet will increase. Therefore, battery swapping is appropriate for fleet which deals with more customer requests and trip demand compared with private cars. [76] proposes an operational framework in integrated shared MoD system and battery swapping station, determining the fleet scheduling and battery charging strategies. VOLUME 9, 2022

5) INTERACTION BETWEEN TRANSPORTATION NETWORK AND POWER DISTRIBUTION NETWORK
As the installed capacity of charging infrastructure kept increasing, the coupling between the transportation network and power distribution network becomes an important factor. The power system operation can be significantly influenced by the fluctuating charging loads in the vehicle charging or battery swapping stations, which are determined by the transportation network. Therefore, it is necessary to consider the interdependence of transportation flow and power flow, and to coordinate the optimization of coupled operation. For the operational model in power distribution networks, AC optimal power flow (OPF) or convex relaxation OPF models can be formed. To coordinate and optimize the operation of transportation network and power distribution network simultaneously, related studies usually combine proper traffic models and optimal power flow models to describe the operational problems of coupled traffic and power networks [77]. Appropriate electricity pricing schemes are used to influence traffic flows in MoD system and achieve economic energy dispatch [1,8]. A joint rebalancing and V2G coordination strategy for transportation system is proposed in [7], where the vehicle-to-grid is facilitated by parking lot.

E. DISCUSSION
The analysis of three modelling approaches is presented as follows. The graph-based models represent the road network topology clearly and vehicle routes correspond to that in the real world. We obtain the vehicle operation when solving the traffic flow on the road arc by three major types of formulations, which are network flow, vehicle-centric and other techniques. Taking network flow model for example, each node in the graph corresponds to a tuple with three dimensions: time, location and state-of-charge, which model the time-varying characteristics and battery charge level of the MoD fleet.
With the queue-based modelling method, the trip is modeled as a queue between nodes. Queuing theory in MoD system deals with randomly arising vehicles which travel on the road with a finite maximum capacity. When designing policies for MoD systems, we specify how vehicles move from one queue to another. The road network is modeled as an abstract queueing network with infinite-server road queues when the road congestion is not considered. It is intuitive and convenient to reflect the quality-of-service including availability of vehicles, the waiting time of both passengers and charging vehicles.
For the grid-based models, the study area is divided into hexagonal grids and each grid can serve as a trip origin or destination. The order-serving action picks up an available order from the platform and transports the passenger from the current location to the destination grid. The rebalancing or charging decision are modeled as an order and are assigned to EVs in the form of dispatch. The reposition action is moving to adjacent grids or wandering in the current grid. The state variable of available EVs (not in service or charging) consists of the current time step, location and battery charge. The grid-based approach is appropriate for integration with data-driven or reinforcement learning since the representation of action in this mode is more straightforward.
Therefore, we would suggest selecting the proper modeling approaches based on the objective of the study. If detailed road traffic analysis is desired, a graph-based approach might be a good choice. In contrast to that, a grid-based approach could be a better choice if the fast-solved fleet management decisions are the primary research objective.

III. SOLUTION METHODS
We categorize the solution methods for shared MoD system operation problems into three groups: mathematical programming, reinforcement learning, and hybrid approaches, as shown in Fig. 3.

A. MATHEMATICAL PROGRAMMING APPROACHES
We list the mathematical programming approaches based on different model formulations. The heuristic-based algorithm is a common way to solve dynamic traffic problems especially when the scale of the problem is large. Model predictive control (also known as receding horizon control) is a control technique whereby an open-loop optimization problem is solved at each time step to yield a sequence of control actions up to a fixed horizon, and the first control action is executed [78].

1) NETWORK FLOW FORMULATION
Dynamic problems are presented via a time-expanded network where nodes consist of locational, temporal and charge characteristics. An electric vehicle (i, t, c) indicates that the vehicle is at the physical node i at time t with charge level c. Accordingly, an edge between n 1 = (i, t 1 , c 1 ) and n 2 = (j, t 2 , c 2 ) exists if and only if j could be reach form i during time period t 2 − t 1 , with charge reducing from c 1 to c 2 . The optimization problem is formulated as a linear programming and resolved by the solver even for large-scale problems [8].
Heuristic algorithm is deployed in [53] and [58], resulting in a near-optimal solution within polynomial time. [36] utilizes Frank-Wolfe algorithm to solve the routing problem after reformulation. Congestion-aware routing scheme captures road-utilization-dependent travel times via a piecewise affine approximation of the Bureau of Public Roads (BPR) model [42]. For the online realization of the problem, the real-time MPC algorithm is utilized in [33], [34], [37], and [39].

2) VEHICLE CENTRIC FORMULATION
Small-scale problems could be resolved by the solver directly. [64] utilizes a solver to resolve the mixed-integer quadratically constrained programming. Alternating direction method of multipliers (ADMM) decomposes the pick-up, delivery, and rebalancing problem with time windows (PDRPTW) problem into each vehicle's routing. Binary variables are introduced to indicate if the vehicle is traversing a road link to optimize routes, which will construct a NP-hard programming as the number of vehicles increases. To tackle the scalability issues, a heuristic algorithm is deployed. Local neighborhood search is employed in [40] to find routes.
Real-time implementation in optimization problems is always combined with model predictive control or recedinghorizon algorithm [7], [30], [31], [54], [57], [78]. [30], [31] determine real-time dial-a-ride large-scale dispatching over a rolling horizon relying on column generation algorithm and backbone algorithm, respectively. Model predictive control (MPC) in parallel with different time-scales is implemented in [54] and [57]. Cascaded model predictive control is utilized and the problem is formulated as a mixed-integer linear programming in [54]. The first MPC scheme, called energy layer, abstracts the vehicle fleet as an aggregate storage system for the sake of model scalability, and it optimizes fleet charging and vehicle-to-grid services to minimize electricity cost over a long time-scale (hours). The second MPC scheme, called the transport layer, optimizes short-term vehicle routing and relocation decisions to minimize customers' waiting times while accounting for the charging constraints derived from the energy layer.
To tackle the scalability and realize online implementation together, [7] designs an effectively distributed heuristic based on model-predictive-control and the genetic algorithm in the integer linear programming.

3) QUEUE THEORY-BASED FORMULATION
[47] reformulates the capacitated routing and rebalancing problem as a linear programming. A heuristic algorithm based on Lagrangian decomposition is proposed to address the challenge which caused by increasing variables number [50].
For the real-time achievement, an online minimum drift plus penalty (MDPP) framework is deployed to obtain the real-time dispatching strategy in [63]. A real-time closedloop rebalancing policy for drivers is formulated as an integer linear programming, which reduces to a linear programming thanks to total unimodularity of two subproblems: rebalancing and assignment [49].

4) OTHERS
For the research where the transportation network is modeled as region-based, [62] studies a heuristic primal-dual method to optimize online charging scheduling. [45] employs receding horizon control to optimize idle mileage induced by rebalancing vehicles across regions towards current and predicted future requests.
For the research where the transportation network is modeled as cell-transmission, a traffic assignment simulator and a heuristic approach are implemented to solve dynamic ridesharing [42]. Tabu search heuristic is deployed to solve the dynamic traffic assignment problem, which is formulated as a mixed integer linear programming [43].
When solving large-scale problems, distributed implementation provides an alternative approach. There are two general types of distributed algorithms: gradient based and dual variable based. For the former, the gradient related step is taken and followed by averaging with neighbors. For the latter, at each step for a fixed dual variable, the primal variables are solved to minimize some Lagrangian related function, then the dual variables are updated accordingly. One of the well-known methods of this kind is the Alternating Direction Method of Multipliers (ADMM), which decomposes the original problem into two sub-problems, sequentially solves them and updates the dual variables associated with a coupling constraint at each iteration. ADMM decomposes the pickup, delivery, and rebalancing problem with time windows (PDRPTW) problem into each vehicle's routing [2].

B. REINFORCEMENT LEARNING APPROACHES
Reinforcement learning methods have significant advantages in solving large-scale, real-time problems that require complex and accurate models. We classify the scholars utilizing reinforcement learning approaches into three categories based on the methods of obtaining the optimal solution, the algorithm schemes of three methods are shown in Fig. 4.

1) VALUE-BASED
The value-based approaches deploy the deep neural network to estimate the value function of an action or a state and implicitly generate a deterministic policy through the value function. Actions are decided by choosing the best action VOLUME 9, 2022 in the state. Temporal difference (TD) error specifies how different the new value is from the old prediction. Deep Q-learning Network (DQN) is the most typical and widelyused algorithm. DQN-based algorithms are used in [9], [26], [27], [28], [52], [79], [80], [81], [82], [83], [84], [85], and [86] to solve the fleet operation and charging management problems. [52] constructs the vehicle dispatching and rebalancing problem as a semi-MDP model. The distribution of orders is estimated using the cerebellar value network (CVNet) and the map is divided into hexagonal grids to improve the efficiency and scalability of the solution. DQN is utilized for policy learning with the aim of maximizing drivers' revenue while minimizing the average pick-up distance for all orders. Order dispatching, rebalancing and charging strategies are formulated as a partially-observed Markov decision process in [9]. Meanwhile, a binary linear programming is embedded in the reinforcement learning process to select the globally optimal action, making it possible to form an online scheduling strategy which is suitable for large-scale fleet to maximize the overall revenue. The two-layer dynamic programming problem is transformed into a single layer in [70] and DQN realizes the dynamic mileage pricing of the order service to ensure the maximization of system income.

2) POLICY-BASED
The policy-based approaches fit the policy function instead of the value function through the neural network, which converges better and is applicable to higher dimensional action spaces. Actions are decided based on the probability distribution. The policy-based algorithm is deployed in [51], [69], [71], [87], and [88] to determine fleet operation and charging scheduling to maximize the overall social welfare. [51] divides vehicles into order-dispatching (OD) group for order serving and fleet management (FM) group for rebalancing to design a novel framework that learns to collaborate in a hierarchical multi-agent setting for ride-hailing platform. [88] constructs the fleet charging scheduling problem as a two-layer model, with one layer for transportation and the other for power. Deep Deterministic Policy Gradient (DDPG) is employed to form the electricity price and guide fleet charging decisions to achieve the joint optimization of the transportation-power network. Based on the principle of network flow, a MDP model is constructed in [69] to achieve fleet management by maintaining a queue of passengers in line. Proximal Policy Optimization (PPO) algorithm is utilized to dynamically price order revenue to maximize driver profits.
3) ACTOR-CRITIC [68], [89] [90], [91] apply the actor-critic based algorithm, which incorporates the first two approaches by implementing the estimated value function to critic actions and update the policy network to obtain the optimal policy faster. The algorithm is robust in different levels of system expansion dynamics. [91] proposes a multi-agent reinforcement learning (MARL) framework and connects two neural networks to improve the overall fleet pick-up rate and overall revenue. Meanwhile, extensive experiments have shown that the proposed approach is robust to different levels of system expansion dynamics. In [68], a novel reward function is designed to solve the dynamic service pricing problem in ride-hailing platforms. The proposed reward function assists the Soft Actor-Critic (SAC) model with faster convergence and higher income than the methods taking revenue as the reward function only. [92] proposes a reformulation of a mixed-integer programming model into a decentralized Markov decision process model which uses centralized training and distributed execution. The model is solved by a unique actor network for each agent and a shared critic network, to address the scalability issues of large-scale smart grid systems.

C. HYBRID APPROACHES
To realize online performance and characterize hard constraints in problem, combined mathematical programming and learning-based programming are employed in [10], [93], and [94]. [93] decouples dispatching and rebalancing (neglecting routing) in two linear programming and links them through reinforcement learning. Vehicle dispatching is yielded by solving the first linear programming, and the optimal rebalancing vehicle distribution is computed via reinforcement learning (based on graph neural networks) and realized by solving the second linear programming. The actor-critic algorithm is utilized to maximize driver profit. In [10], the Stackelberg equilibrium investigates the responsive behavior of MoD operator (order-serving, repositioning, and charging) which is formulated as a multi-commodity network flow model. A SAC-based multi-agent deep reinforcement learning algorithm is developed to solve the proposed equilibrium framework. [94] proposes a reinforcement learning-based algorithm with decentralized learning and centralized decision-making components. The centralized decision-making process enables coordination of the individual EV by formulating the EV fleet dispatching problem as a linear assignment problem, which maximizes the EV fleet's action value function.

D. DISCUSSION
Those aforementioned solution methods have their advantages and disadvantages, which is presented by following points.

1) OPTIMALITY
Generally speaking, the optimality of the mathematical programming methods can be guaranteed if the problem is convex. In contrast, the optimality of reinforcement learning based solution cannot be theoretically guaranteed.

2) CONSTRAINTS
Reinforcement learning approaches can face difficulties when incorporating complex physical constraints. While mathematical programming methods allows accurate representation of physical operational constraints for the shared MoD system.

3) COMPUTATIONAL EFFICIENCY AND SCALABILITY
A major drawback of mathematical programming approaches is that solving large-scale problems is always challenging. This becomes even worse if more detailed spatial-temporal model or real-time decision making are desired. The advantage of reinforcement learning method is could adaptively learn a near-optimal solution using the capability of neural networks. For the online implementation, it is gradually difficult for mathematical programming methods to solve as the time slot becomes shorter.
Therefore, hybrid approaches combine the advantages of these two approaches including online implementation and physical constraints characterization. However, it still requires careful tuning to achieve an ideal performance. The comparison of three kinds of reinforcement learning approaches is presented in Table 1.

IV. RESEARCH OUTLOOK
The aforementioned works have made remarkable progress in this research area. However, there are some other interesting research directions could be investigated to further account the operational characteristics and future trends of shared MoD system:

A. QUALITY OF SERVICE
To describe the operational characteristics of shared MoD system in a more detailed and realistic manner, the quality of service for passengers and drivers shall be accounted for in the modeling. Those indices include order-serving waiting time, charging waiting time, relocation distance and average trip time. Incorporating those factors into the models can further reduce the divergence between computational results and real-world results.

B. IMPACT ON POWER DISTRIBUTION SYSTEM
The impact of the shared MoD system on the power distribution system shall be further investigated. With its unique flexibility, the coordinated fleet management and charging scheduling decisions could make the charging load as a movable and dispatchable demand side resource, which is quite valuable to the flexible operation of the power distribution system and enhances its capability on voltage regulation, renewable energy integration and congestion management.

C. NAVIGATION MECHANISM
The navigation mechanism for the shared MoD system shall be coordinated with both power system and transportation system operation. The dynamic charging prices can serve as a signal to alter the charging scheduling decisions. The spatial-temporal aware transportation prices can further affect the fleet management decisions. Combing those factors together, the shared MoD system may be able to provide more significant flexibility to improve the operational efficiency for the coupled power-transportation system.

D. COMPETITION AMONG VARIED ENTITIES
In the real-world scenario, multiple shared MoD fleets simultaneously exist in the same transportation system. Therefore, it is important to investigate the competition equilibrium among multiple MoD fleets. Furthermore, the competition among shared MoD fleet and other travel options, such as private vehicles and public transit shall be further explored. Moreover, the interactions among distribution system operator, shared MoD system operator and end-users could be also be investigated.

E. ENVIRONMENTAL IMPACT
The environmental impact of the shared MoD system shall be further analyzed. Particularly, the spatial-temporal flexibility induced by the shared MoD system in both power and transportation system shall be investigated to quantify its environmental impact in both power and transportation sector. Moreover, long-term saving, such as reducing the need of more power and transportation infrastructures, shall also be considered when calculating the environmental impact.

F. MODELING AND SOLUTION METHOD
As the scale of the shared MoD system kept increasing, it becomes more and more important to efficiently model the VOLUME 9, 2022 fleet management and charging scheduling of a large-scale shared MoD fleet. Furthermore, those types of operational problems shall be accomplished in an online manner. Therefore, it is important to further investigate the computational inexpensive modeling and solution methods for the largescale shared MoD fleet. Moreover, how to combine the model-based and model-free (i.e., data-driven) approaches to achieve a trade-off between modeling accuracy and computational efficiency shall be further discussed.

V. CONCLUSION
In this paper, we provide a comprehensive review of the shared MoD system research. We categorize research based on the modeling approaches and solving methods. The operational problems of the shared MoD system are classified into four types, which are 1) order-dispatching, 2) orderdispatching and rebalancing, 3) order-dispatching, rebalancing and charging, and 4) extended. Mathematical models include graph-based, queue-theory based, grid-based and others such as the cell transmission model, which is relatively rare. Among these models, graph-based models represent the road network in the clearest way and queue theory-based models are appropriate for measuring the quality of service. Grid-based models are suitable for integrating with data driven or reinforcement leaning. Therefore, we suggest selecting the proper modeling approaches based on the objective of the study. Graph-based model would perform better if detailed road traffic analysis is desired. In contrast to that, a grid-based model would be a good option if the fleet management decisions are required to be determined fast.
Solution methods are divided into mathematical programming approaches, reinforcement learning approaches, and hybrid approaches. Mathematical programming approaches include linear or non-linear programming, heuristic algorithm and model predictive control. They can accurately characterize all the physical operational constraints for the shared MoD system. However, the scale performance is poor when detailed spatial-temporal constraints are considered. Reinforcement learning approaches include value-based, policybased and actor-critic algorithm. They could learn a nearoptimal solution utilizing neural network adaptively. Hybrid approaches combine learning methodology and mathematical programming together to indicate physical constraints and realize online performance in large-scale implementation.
Therefore, the proper solution methods depend on the problem types. If a large-scale problem requires a realtime solution, reinforcement learning performs better in time and efficiency. When the problem requires exact constraints expressions and solution optimality, a mathematical programming approach would be a choice especially when the convex programming is formulated.