Three-player Cooperative Game with Side-payments for Discretionary Lane Changes of Connected Vehicles

The rapid development of new technologies, e.g., connected vehicles and e-wallet, offers us opportunities to rethink cooperative lane-change (LC) models. Most of existing cooperative LC models require a selfless assumption by default, which is usually too stringent for travelers. A cooperative LC game framework with appropriate side-payments can make up for this deficiency. In such games, when travelers maximize their own benefits, they will naturally cooperate to improve the system performance. This paper designs a four-step transferable utility based three-player game framework, which can cover all LC scenarios on multi-lane roads combing with the two-player model in the previous work. Our four-step framework is also suitable for other cooperative games with side-payments. Furthermore, the simulation results show that, cooperative games with side-payments can reduce vehicles’ LC frequency and meanwhile benefit all transaction vehicles in most scenarios in expectation. Moreover, the urgent vehicles with very high values of time can save up to 42% travel time on congested roads under the simulation settings.


I. INTRODUCTION
Discretionary Lane Change (LC) is one of the primary driving behaviors observed in traffic flow. Its primary purpose is to gain a higher speed or a better driving environment [1]. According to National Highway Traffic Safety Administration (NHTSA) research data, LC accidents accounted for 27% of all accidents [2], and EU statistics show that 75% of the LC accidents were caused by the drivers' mistaken decisions [3]. The rapid development of new technologies, e.g., connected vehicles and e-wallet, offers us opportunities to rethink the real-time traffic operation. With a quick information exchange through connected vehicle technology, vehicles can react much faster without worrying about mistaken LCs. And with the possibility to receive compensation through e-wallet for giving out the right-of-way, the lane changer and the follower in the target lane (which we call lane keeper in the rest of the paper) can not only realize win-win results, but also avoid some unreasonable LC requests.
The efforts on LC modeling mainly fall into three themes: modeling the LC decision-making process [4]- [9], modeling the LC impact on surrounding vehicles [10], [11], and modeling the LC (trajectory) control [12]- [15]. We focus on the LC decision modeling in this paper. A main issue of traditional LC decision model is that, it is often modeled as a one-player (the lane-changer) decision making process [1]. For example, the Gipps model regards the LC decision as a deterministic result of the lane changer's comprehensive consideration about LC and hence ignored almost all interactions between the lane changer and the lane keeper [4]. Field observations, however, found that a typical decision in congested traffic closely involves at least two vehicles -the lane changer and the lane keeper [11].
To describe the competing relationship (for right-of-way) between the lane changer and the lane keeper, researchers started to use game theoretic techniques [16]- [18]. Most of them assumed that travelers are purely egoistic and treated the LCs as non-cooperative games (using solutions such as Nash Equilibrium and Stackelberg Equilibrium), in which the lane changer and the lane keeper test each other and try to maximize their own benefits. With the connected/automated vehicles technologies, researchers start to consider cooperative LC games to achieve collective or group optimum [19], [20]. Such games are usually preferred by the system optimization researchers since they can improve the overall system performance. In contrast to the egoistic assumption, most cooperative LC games assume that travelers are entirely selfless and are willing to do whatever required to help maximize the collective or group optimum [12]- [14], which is an even more stringent assumption.
Combining the economic instrument (e.g., directtransaction) and e-wallet technology, researchers could achieve some cooperative solutions without the selfless assumption [21]- [23]. They designed real-time mechanisms to allow for side-payments in LC scenarios. Using such mechanisms, when travelers maximize their own benefits, they naturally cooperate to improve the system performance. Specifically, Lin et al. proposed a cooperative game framework for discretionary LC [21], [22]. They adopted the framework of the Transferable Utility (TU) game and allowed vehicles to make monetary transactions regarding the right-of-way. Lloret-Batlle and Jayakrishnan proposed an optimization method to serve mandatory LC on highways [23]. They minimized the vehicles' "envy" and allowed vehicles to pay to jump the queue, where "envy" is the level of desire for other vehicles' side-payment and allocation. Similar direct-transaction frameworks also work in the realtime intersection controls [24]- [27]. The key of the directtransaction mechanism is to valuate vehicles' potential time saving (PTS) using the value of time (VOT), through which researchers can transfer the time into tradable "property".
This paper belongs to a series of Lin et al.'s cooperative discretionary LC research with side-payments. As the pioneering work in application of direct-transactions in LC, the previous work [22] only dealt with the two-player games. However, on highways with more than two lanes, there will be cases when three vehicles competing for priority simultaneously. Hence for completeness, we design a three-player LC game framework (for multi-lane highways), which is compatible with the two-player LC game. Furthermore, this paper proposes a simple and efficient four-step side-payment framework. This four-step framework is also suitable for other cooperative games that considering sidepayments. Moreover, this paper investigate more complicated simulation scenarios, such as highway with different speedlimit lanes, vehicles with more than two VOTs, etc. They enrich the scenarios of [22].
The rest of the paper is organized as follows: the next section describes the problem, the subsequent section presents the four-step three-player cooperative mechanism framework, the fourth section shows simulation experiment results, and the last section concludes the paper.

II. PROBLEM DESCRIPTION
The problem we consider in this paper is the discretionary LC games happening on multi-lane highways. Fig. 1 shows a scenario with different kinds of vehicles. We assume a con- nected vehicle environment in which all vehicles are able to exchange information with each other, including the position, speed, LC intention, PTS, VOT, etc. This paper assumes that the information communicated through the CV technology is accurate. Furthermore, we assume mixed traffic: subscriber vehicles (colored vehicles in Fig. 1) capable of (and willing to) engage in lane change transactions, referred to as transaction vehicles (TVs), and outsider vehicles (black vehicles in Fig. 1) that either do not possess transaction capabilities or do not wish to engage in transactions, referred to as nontransaction vehicles (NTVs). All vehicles are free to choose between being TVs or NTVs, and their choices are regarded as an input to the model. Moreover, we assume TVs are rational, which means their preferred action is always the one with higher expected speed (within the speed limit). The first two cases can be solved by the two-player LC game in [22], and this paper focuses on solving the threeplayer LC game with side-payments in the last case.
We note that for highways with more than three lanes, the maximum number of players competing for the same rightof-way is still three. This is because we allow at most one vehicle in each lane to participate in a game, and vehicles can only compete for the right-of-way on their own lane or adjacent lane(s). For scenarios when extra vehicles' action is affected by the result of a game (one such example is shown in Fig. 3), the existing game is played first, and the affected vehicle will decide whether to play a new game depending on the result of the existing game at the next time instant. For example, for the scenario in Fig. 3, the red, orange and black vehicles will first play a three-player LC game; based on the result of this game, at the next time instant, the yellow vehicle will either play a game with the black vehicle (if the result is to make the black vehicle not change lane), or change lane directly without playing a game (if the result is to make the black vehicle change lane). Obviously, the combination of the two-player and three-player games can cover all scenarios on multi-lane highways.

A. FOUR-STEP FRAMEWORK
Our real-time direct-transaction framework includes the following 4 steps. ‚ Step 1: Determine the game players. ‚ Step 2: Calculate the payoffs for different action combinations. ‚ Step 3: Choose the best action combination that maximizes the total payoff. ‚ Step 4: Calculate the side-payments and make transactions. We shall introduce each step for three-player games in details in this section.

B. STEP 1: GAMES AND VEHICLE PLAYERS
We denote X " tA, B, Cu as the player set in a threeplayer game, where player X P X. NTVs will not play TU game, hence any LC game involving NTV(s) will first play a Non-TU (NTU) game in which the right-of-way is randomly allocated to each player with equal probability. For a threeplayer LC game, we have three cases: 1) If only one player is a NTV, then this NTV has a 1{3 probability to win the game and take the right-of-way without side-payment, and the other two TVs has a 2{3 probability to win the game. If the result is for the two TVs to win, then they will play a two-player TU game (shown in [22]), and the final winner TV will pay the loser TV and take the right-of-way.
2) If two or three players are NTVs, then each player has a 1{3 probability to win the game and take the right-of-way without side-payment.
3) If all players are TVs, then they will play a three-player TU game, the winner TV will pay the two loser TVs and take the right-of-way.

Following
Step 1, if a TU game is needed to solve the LC game, the second step is then to calculate the payoffs for all action combinations in the TU game.
For player X , it has two actions: the preferred action "1" with a potential speed of v 1 X , and the alternative action "2" with a potential speed of v 2 X . Since we assume TVs are rational, we have that v 1 X ą v 2 X . The payoff function of X , denoted by u X , is related to its own and other players' action choices. We set the payoff of X 's alternative action as the baseline with a value of 0. The payoff of X 's preferred action is set as a big negative value´M if it leads to a crash. Otherwise, the payoff of X 's preferred action is a quasi-linear function of VOT (β X ) and PTS (t X ). We have where v X is the potential speed of X , X´1 is players in X excluding X . The first condition in (1) says that as long as we have more than one vehicle choose the preferred action, there will be a crash, and the payoff of any vehicle involving in the crash is´M . The second condition in (1) says that if only one vehicle chooses the preferred action, then this vehicle's payoff is the product of its VOT and PTS. The last condition in (1) says that the payoff of any vehicle who chooses the alternative action is 0. As VOTs are reported by the vehicles, the key to calculate payoffs is to estimate the PTS. We adopt the PTS estimation model in [22] to calculate PTS: where Sign is the sign function, v E is the expected mean speed of the road segment which is influenced by the traffic density, τ is the speed adjustment time, and a is the mean acceleration rate. Because we set a ą 0 in this paper, hence VOLUME 0, 2021 the sign function indicates acceleration or deceleration. In short, (2) divides the influence of the LC game into two stages. During the first stage, vehicles adjust their initial speed to the preferred or alternative speed. During the second stage, vehicles gradually recover from the new speed to the expected mean speed. The gain in distance during the two stages under the preferred speed is then divided by the expected mean speed to obtain the PTS. The detailed analysis is in [22]. Consider the three-player game in Fig. 2 (c). We denote the payoff to X in a three-player game as where i, j, k are the action choices of A, B, and C, respectively. According to (1), the payoffs of each action combination are listed in Table 1.

D. STEP 3: PARETO-OPTIMAL FRONTIER AND BEST ACTION COMBINATION
In a three-player game, we let ω ijk denote the total payoff that can be achieved by players A, B, and C under action pi, j, kq, that is, ω ijk " ř X PX u ijk X . Hence, system optimality implies that the chosen action combination is the one that maximizes the total payoff: where pi˚, j˚, k˚q is (any of) the best action combination(s). Denote the maximal payoff by ω˚, then In our TU game framework, we allow side-payments among players after an action combination is chosen. Let σ X P R denote a feasible side-payment of player X . We adopt the convention that σ X ą 0 means X receives payment, and σ X ă 0 means X pays the others. The total side-payments of a game is zero: ř X PX σ X " 0. For any pi, j, kq combination, we define the final payoff of X after side-payment as r where ω ijk is a constant (these constants can vary from one pi, j, kq combination to another with ω i˚j˚k˚" ω˚). We have that the set of final payoff combinations r u ijk X associated with the strategy point pi, j, kq in a TU game falls along a plane going through the point pu ijk A , u ijk B , u ijk C q with the same intercept ω ijk in three axes. We name the plane that goes through pu i˚j˚kÅ , u i˚j˚kB , u i˚j˚kC q as the Pareto-optimal frontier. An example is shown in Fig. 4. The eight points are eight action combinations without side payments, and in this example we suppose pv A t A , 0, 0q is the payoff vector of the best action combination. Hence, the blue plane going through the point pv A t A , 0, 0q is the Pareto-optimal frontier. Any point on this plane is a feasible, optimal, and cooperative solution of the TU game, which has the maximum total payoff. Determining the side-payments is choosing one appropriate point on the plane to satisfy all three vehicles.

E. STEP 4: TRANSFERABLE UTILITY GAME AND SIDE-PAYMENTS
To determine the final side-payments, our cooperative game framework uses a three-player TU game similar to the twoplayer game in [28]. In the absence of agreement, players choose mixed strategies that ensure guaranteed payoffs independent of the other players' strategies. This is known as the threat strategy. Let partition p i X denote X 's threat strategy, where i P t1, 2u is the action choice, p i X P r0, 1s is the probability for vehicle X to choose action i, with ř 2 i"1 p i X " 1, and X P X. Hence in the absence of agreement, when all vehicles choose the threat strategies, the payoff of player X , denoted by S X , is The solution to the TU game is one such that the expected payoff of X is no less than S X . This defines three points along the pareto-optimal frontier: pS A , S B , ω˚´S A´SB q, pS A , ω˚´S A´SC , S C q, and pω˚´S B´SC , S B , S C q. Any convex combination of these three points is a TU solution and, particularly, the midway point constitutes a natural compromise [29]. The final payoffs of A, B, and C at the midway point should bepω˚`2S A´SB´SC q{3, pω˚`2S B´SA´SC q{3, and pω˚`2S C´SA´SB q{3, respectively. We hence have The side-payments in a three-player TU game is therefore given by To determine the threat strategies, first note that X is naturally incentivized to select a threat strategy that maximizes Do not give way (0,´M ,´M ) (0, β B t B , 0) Give way (0, 0, β C t C ) (0, 0, 0) σ X , which is equivalent to maximize 2S X´ř X´1 S X´1 according to (8). We define If we let BF X Bp 1 X " 0, @X P X and let M Ñ 8, we have p 1 X Ñ 0. We hence take p 1 X " 0, p 2 X " 1, @X P X as the threat strategy, which corresponds to the action combination pi, j, kq " p2, 2, 2q. Therefore, S X " 0, and we have σ X " ω3´u i˚j˚kX , @X P X.
Fig . 5 shows the TU solution of the example in Fig. 4.

F. SUMMARY
When implementing the direct-transaction mechanism for discretionary LCs on multi-lane roads, one needs to follow the above four steps one by one. When a three-player TU game is involved, one can use the methodology proposed in this paper, and its side-payments can be calculated by (10). When a two-player TU game is involved, one can use the model in [22], and its side-payments are calculated by σ X " ω2´u i˚jX , @X P tA, Bu.
The direct-transaction LC framework is summarized in Fig.  6.

A. SIMULATION TOOL
We use Cellular Automaton (CA) as the simulation tool. Traffic CA is firstly proposed by Nagel and Schreckenberg [30]. They split a road into small cells with equal length (7.5 m usually). Every cell can be occupied by at most one vehicle. The time is also discrete (e.g., every 1 s). On a highway with a maximum speed of 135 km/h, a vehicle can move at most 5 cells/step when the cell length is 7.5 m and each step is 1 s. By checking the state of every cell at each step, we can track the trajectory of every vehicle. Besides, CA captures the non-increasing relationship between spacing and speed, and can reproduce the correct macroscopic behaviour [31]. We use Matlab to run the CA simulation experiments. In our simulation, we take every single vehicle as a unique agent and track their microscopic behaviors such as car-following and lane-changing behavior according to the CA simulation. Each vehicle corresponds to a unique number, we record their information every step, including the location, speed, LC request, VOT, etc. The detailed CA simulation rules in our experiments are shown in Algorithm 1.
Algorithm 1 focuses on a multi-lane road simulation, where L cs is the cell size, N c is the number of cells, N is the number of lanes, T ts is the total simulation time, T r is the time resolution, Q ir is the inflow rate, p sd is the probability of slow-down, p tv is the percentage of TVs, v n max is the max speed of lane n ( n " 1 : N ), and v max "  [32]. The percentage of TVs with the corresponding VOT is 20%, 30%, 30%, and 20%, respectively. The speed adjustment time τ is set to 1 s, and the mean acceleration rate a is set to 2/3 cell/s 2 . The road is designed to be a circle without ramps to exclude mandatory LCs so that we can focus on the performance of the proposed model. Fig. 7 shows the difference in mean speeds across a range of traffic densities between different VOT TVs. Specifically, we see that the speeds of TVs increase with VOT, outside of free flow conditions and totally jammed traffic. Such improvement is more obvious when the traffic is relatively heavy. This implies that a TV with higher VOT is more likely to win in a LC game, and hence can travel faster.

C. IMPACT ON TRAFFIC
One might suggest that the direct-transaction framework will stimulate more frequent LCs. However, since the game winner can be either the lane-changer or the lane-keeper, this suggestion is not necessarily true. By changing the default value of PR, Fig. 8 shows how density and TV percentage will affect LC frequency. As can be seen from Fig. 8, the LC frequency reaches the highest value when the density is around 90 veh/km/lane given a fixed TV percentage, and the LC frequency is (generally) higher when we have lower TV percentage given a fixed density. A higher PR helps reduce the LC frequency. That is, if more vehicles join the directtransaction game, we will see less frequent LC behaviors. This is exact opposite to the suggestion that direct-transaction will increase LC frequency.  We next show how density and TV percentage will affect transaction frequency in Fig. 9. We can see that the transaction frequency reaches the highest value when the density is around 90 veh/km/lane given a fixed TV percentage. Given a fixed density, the transaction frequency decreases dramatically with the decrease of TV percentage. This is intuitive since transactions rely on TVs, and there will be no transactions at all if all vehicles are NTVs.
In general, the density's impacts on both the LCs and transactions are similar. When the density is in a totally jammed condition (133.3 veh/km/lane) or free-flow condition (0 veh/km/lane), there is neither LC nor transaction. When the density is about 90 veh/km/lane, both frequencies reach their maximum points. However, the TV percentage has Algorithm 1: CA Simulation of Traffic Dynamics 1: Input 2: L cs , N c , N , T ts , T r , Q ir , p sd , p tv , v n max , v max , vehicles' VOTs and initial positions. 3 , v n´1 max q. 14: If n ă N : 15: d right Ð distance to leader in right neighbor lane (# cells). 16: For each potential lane changer from downstream most&leftmost: 19: If it has conflicting vehicle(s), play a game X " tA, Bu or X " tA, B, Cu: 20: Calculate payoffs. 21: If all players are TVs: 22: Play TU game: find pi˚, j˚, k˚q (or pi˚, j˚q) and σ X . 23: Else If number of TVs is not more than one: 24: Play NTU game: each vehicle has equal opportunity to win the game. 25    the opposite impacts on the LC and transaction frequency.
Higher TV percentage means a higher probability to play the TU game, hence transaction frequency will increase dramatically with the increase of TV percentage. "Surprisingly", VOLUME 0, 2021 higher transaction frequency does not result in more frequent LCs, but reduces the LC frequency instead.
To explore the reason behind the different impacts of PR on LC and transaction frequencies, we illustrate the percent of winners in TU games under different TV percentages and densities in Fig. 10. As shown in Fig. 10, the winpercent of lane-keepers ranges from 55% to 90% in all tested scenarios. That is because a lane-changing behavior usually affects lane-keeper's speed a lot, hence the PTS of the lanekeeper is usually higher than the PTS of the lane changer. The lane-keepers therefore can win much more easily in our TU mechanism. This explains why more transactions result in fewer LCs.

D. BENEFIT OF TVS
We investigate the "benefit" of TVs in this experiment. The benefit index is calculated as where B denotes benefit, τ tr is the average travel time of all vehicles i in the set of TV V, ∆T tr i,g is the travel time difference (saved: positive, wasted: negative or 0) by vehicle i in the lane change game g, G i is the set of lane change games played by vehicle i and P i,g is the monetary transfer for vehicle i in game g (income: positive, payment: negative, or 0). The total benefit per hour-travel under varying densities for different VOT TVs is illustrated in Fig. 11.
For all VOT TVs, the benefit is positive in general. Furthermore, as PRs increase, the total benefit tends to grow. We see the same pattern as the traffic density increases from 0 to 120 veh/km. For densities approaching the jam density (133 veh/km/lane in our simulations), LC becomes more difficult and benefit approaches zero. Hence, the highest benefit to all VOT TVs is around heavy congestion, where vehicles can still perform LC maneuvers. Similarly, the simulation results show that in free flow conditions (density ă10 veh/km/lane), heavily jammed conditions (density ą131 veh/km/lane) or when PR is very low (ă0.05), the benefit is very small (between -0.2% and 0.2%). Because the benefit of TVs is always positive, some NTVs may be encouraged to join TU games. Moreover, higher PRs mean higher benefits. Hence joining TU games can result in increased benefits to all vehicles.

E. "VIP" TVS
In this experiment, we create 1% "VIP" TVs with high VOTs varying from 30 dollars/h to 105 dollars/h while holding all else fixed. The aim is to test how much time a vehicle can save if it is in an emergency with very high VOTs. The results are illustrated in Fig. 12.
As we can see, in moderate to very high congestion, the "VIP" TVs save about 10-42% in travel time when VOT increases from 30 dollars/h to 105 dollars/h. Interestingly, however, we observe a bound on time saving of about 42%, which when reached cannot be further improved with a greater payment (a greater VOT does not help when it is larger than 60 dollars/h). Such bound can be attributed to the "VIP" TV being blocked by their leaders. It may be broken if the TV is allowed to engage in transactions with multiple vehicles simultaneously, namely, including the lag vehicle in the target lane and leaders in both the current and target lanes.

V. CONCLUSION
As one of the pioneering works using direct-transaction mechanism for discretionary Lane Changes (LCs) in connected vehicle environment, [22] proposed a two-player LC game framework that can be used on highways with two lanes. This paper proposes a four-step three-player game framework with direct-transactions for discretionary LCs. We show that combining the proposed three-player game with the two-player game in the previous work, one can deal with all discretionary LC scenarios on multi-lane highways (with three or more lanes). More complicated traffic scenarios on highways with fast-slow lanes and more than two kinds of values of time are tested using the cellular automaton simulations. Results show that the proposed direction-transaction LC games can avoid some "unnecessary" LCs, improve the benefit of all transaction vehicles in expectation, and help the urgent vehicles with high value of time to save up to 42% travel time when traffic is heavy under the simulation settings. This paper focuses on the discretionary LCs, and hence the experiments did not consider the influence of mandatory LCs. One can find studies that focus on mandatory LCs in the literature, e.g., [23]. When implementing in the real world with connected vehicles in the future, one can choose between the discretionary and mandatory LC models accordingly. An interesting topic for future research is to investigate the performance of direct-transaction LC mechanism on more complicated road scenarios with both discretionary and mandatory LCs.