Active Supervision Strategies of Online Ride-Hailing Based on the Tripartite Evolutionary Game Model

As a very important passenger transportation model in the era of sharing economy, the online ride-hailing (ORH) has also caused new traffic management issues while improving resource allocation. Although regulations and policies have imposed macro-level supervision on the ORH market, they have not prevented some drivers from cheating on platforms’ subsidies and jeopardizing passengers’ safeties at the source. In order to realize the voluntary and sustainable ORH supervision, and enable relevant participants to actively supervise, report and comply with rules, this paper constructs an evolutionary game model among the platform, passengers and drivers. Based on the bounded rationality and expected benefits of the participants, the main factors determining the optimal strategies are analyzed. At the same time, the evolution path and the equilibrium state of the three game groups are studied by numerical simulation. The results show that important factors of realizing the benign supervision of ORH include minimizing the reporting costs of passengers, making penalties for drivers who violate the rules far greater than the illicit incomes, realizing the platform supervision costs less than the sum of penalty incomes and positive social effects. In addition, improving rewards for reporting can promote the continuity of passengers’ participation but increase the possibility of false reports. Therefore, the platform needs to consider the cost of identifying false information when designing the reward amount.


I. INTRODUCTION
The continuous advancement of mobile internet technology has promoted the vigorous development of the sharing economy and promoted the in-depth integration of various resources. As a new representative of travel modes, online ride-hailing (ORH) is an extremely effective market-oriented resource allocation under the theory of sharing economy [1]. ORH refers to the online platform realizes information integration with the help of mobile internet technology, and provides timely and matched information for drivers and passengers in the form of application, so as to meet the travel need of passengers [2]. Since the establishment of YiDao in 2010, China's first online taxi booking service platform, many ORH platforms, such as DiDi and UCAR, have gradually emerged in China's travel market.
The associate editor coordinating the review of this manuscript and approving it for publication was Hisao Ishibuchi .
The original intention of the ORH platform is to add vitality to the passenger transport market and promote the benign development of the travel industry. However, the ORH management mode is mainly based on the competition mechanism of the free market, which has caused some negative effects in the actual operation [3]. For example, in order to get extra incomes, some drivers pick up passengers with the similar route on two different ORH platforms at the same time, forcing passengers who choose not to carpool to ride with others. What is more, drivers can set up public evaluation labels for the passengers they have picked up, which exposes passengers' personal information and increases their security risks. Passengers can also evaluate the services of the drivers, and some untrue evaluations will adversely affect the reputation and work of the drivers. In addition, some ORH platforms lack complete and efficient supervision systems, which makes them difficult to quickly handle complaints from drivers and passengers, resulting in some personal injuries and property losses that could have been avoided.
It is very urgent to formulate reasonable and effective ORH supervision strategies. Most of the existing literatures focus on strategy research with government participation, but the strategy should not be formulated solely from the external perspective of government mandatory intervention [4], [5]. Although the corresponding laws and regulations can maintain the order of the industry from a macro perspective [6], they cannot enable ORH participants to actively implement supervision and comply rules. Therefore, this paper starts from the platform's own plan and combines the expected benefits of passengers and drivers to formulate supervision strategies, and explores how to make the ORH supervision work actively with the cooperation of the platform, passengers and drivers without the help of the government.
In view of the limitations of current statistical methods on ORH market in data acquisition and calculation accuracy, this paper adopts the game theory to analyze the important factors affecting supervision from the perspectives of participants' expected benefits. At the same time, due to the fact that each participant cannot obtain all the information to make the optimal decision in the actual situation, that is, they are all bounded rationality [7], this paper adopts the evolutionary game theory that satisfies this premise to analyze the behavioral relationship of the ORH participants. In addition, the goal of the evolutionary game is to find the evolutionary stability strategies for participants, which is in line with the strategic expectation of this paper that the passengers could actively cooperate with the supervision of the ORH platform and the drivers could comply with the regulations.
The remainder of this paper is organized as follows: In Section 2, the existing literatures are reviewed from three aspects: game groups, the evolutionary game and regulatory situation. In Section 3, an evolutionary game model composed of the ORH platform, drivers and passengers is designed, and the model parameters, behavior strategies, payoff matrixes and expected benefits are analyzed in detail. Section 4 analyzes the evolutionary stability strategies of each game group based on the replicator dynamics equations, and analyzes the overall stability of the three groups based on the equilibrium points and the Jacobian matrix. In Section 5, the evolution paths of participants in the initial state and the parameter variation state are discussed by numerical simulation. Section 6 summarizes the whole paper from three aspects: research conclusions, improvement suggestions and future research directions.

II. LITERATURE REVIEW
Firstly, in order to accurately reflect the behavioral relationships and the expected benefits among game groups, this section reviews the literatures from the perspectives of ORH platforms, drivers and passengers. Secondly, in order to fit in with the research content of this paper on ORH supervision, this section reviews the literatures of evolutionary game in the field of transportation and regulation.
Finally, in order to ensure the rationality of the proposed ORH supervision model, this section reviews the literatures on ORH supervision from two aspects: regulatory policies and market supervision.

A. GAME GROUPS
From the perspective of ORH platforms, existing literatures mainly study operation modes and market impacts. Saadi et al. [8] used the machine learning approach for characterizing and forecasting the short-term demand of the ORH platform, and they proposed a spatio-temporal estimation function, including traffic, pricing, weather conditions and other factors. Harding et al. [9] studied impacts of the ORH platform on the Taxi market, they found that there were problems such as unstable supply and demand relations, and they suggested that the possibility of application-oriented taxi market monopoly and collusion should be reduced. Watanabe et al. [10] analyzed the platform ecosystem architecture of Uber, and the research showed that the two sides of information and communication technology were attributed to the virtuous cycle of price decline and travel increase.
For passengers, preference situation and service satisfaction are the main research contents. Rayle et al. [11] found that the main reasons why passengers chose the ORH service were convenient payment, time saving and travel efficiency. Dawes [12] studied Americans' perception of technologies and services towards Uber, and he investigated the reasons why passengers used the platform and its travel proportion. Luo et al. [13] proposed a privacy-preserving scheme for ORH service, this scheme allowed passengers to efficiently match drivers based on their distances in the road network without revealing their location privacy.
Most of the literatures about drivers focus on the behavior characteristic and the rights protection. Xu et al. [14] examined the factors that affected drivers' response behavior to ORH requests, they proved that drivers were more likely to respond to requests with economic incentives, and those with a lower spatio-temporal demand intensity or a higher spatio-temporal supply intensity. Griffin et al. [15] studied the effectiveness of various ways of hiring ORH drivers, and they found that media interaction could improve the registration rate of drivers. Zou [16] thought that the existing criteria in Chinese labor law for ascertaining the status of the ORH drivers was useful for addressing the basic question of whether drivers should be protected.

B. EVOLUTIONARY GAME
In 1978, Taylor and Jonker's research on the relationship between stable evolutionary equilibrium and dynamics promoted the development of the evolutionary game theory [17]. Then, through the combination with traditional games such as Nash equilibrium, the evolutionary game was further applied to the field of economic management. The evolutionary stable equilibrium was extended to the stochastic stable equilibrium, and the replicator dynamics mechanism was upgraded to the stochastic individual dynamic learning mechanism [18]- [21]. The evolutionary game is to achieve the evolutionary stable state of a system through repeated games of random pairing among groups and continuous optimization strategy by using replicator dynamics mechanism [22]. In recent years, the researches on the evolutionary game theory have been developed in a variety of fields, including environmental protection, financial development, transportation management and public supervision.
In the aspect of supervision [23]- [25], Shen and Wang [26] designed two government haze prevention mechanisms by using the evolutionary game, centralized supervision and long-term supervision, then they proved that long-term supervision was superior to centralized supervision. Shen et al. [27] proposed an evolutionary game model for studying the behavioral decision-making of stakeholders in construction and demolition waste recycling under environmental supervision, they found that production cost, technology, subsidies, and recycling benefits exerted certain influences on the ideal stable state. Zhang et al. [28] explored the policy regulation of the conversion of kitchen waste oil into energy fuel by constructing an evolutionary game model, and found that the government should eliminate the garbage disposal fee charged by restaurants and increase the quantity-based subsidy to biofuel enterprises.
In the aspect of transportation [29], [30], Encarnação et al. [31] investigated the feasibility of breaking the dilemma of electric vehicles by resorting to the evolutionary game mechanism, their findings suggested that full adoption of electric vehicles required coordination among governments, companies, and consumers. Xue et al. [32] proposed a private capital investment method in public transportation considering passengers' values, and they constructed an evolutionary game model to quantify the impacts of private capital investment. Keivanpour et al. [33] combined the evolutionary game theory with the fuzzy rule approach to analyze the green environmental strategies of automobile manufacturers, so as to fully take into account the competitive advantages of implementing these strategies and the interaction between participants.

C. REGULATORY SITUATION
In the researches on the ORH regulation, most scholars focus on the regulatory policies and market supervision. Because countries around the world have issued a number of regulations and policies on the ORH market, there are a lot of studies on the analysis of government measures. In order to balance social governance and promote technological development, the government of Singapore has adopted a gradually strengthened regulatory approach. And the intensity and scope of its regulation on ORH enterprises have been optimized in recent years [34]. The transport bureau of London, UK, has replaced development restriction with reasonable regulation, and focused on promoting market equity, user safety and environmental protection [35]. The ORH regulatory measures in the United States have been mainly divided into four aspects: quantity control and fare control, status recognition of enterprises, job qualification of drivers and protection of other interest groups such as passengers [36]. In China, governments in first-tier cities such as Beijing, Shanghai and Tianjin have imposed strict restrictions on ORH drivers' household registration, license plates and vehicle specifications [37].
Because the ORH industry is in an era of rapid development of information technology and market demand, many scholars have also focused on the innovation of market supervision. Wyman [38] thought that the governments could refer to relevant standards of taxis when formulating the ORH regulatory standards, because they were substitutes for citizens' daily travel. At the same time, based on the principle of benefit maximization, specific regulatory standards could be formulated for ORH and taxi respectively. Dudley et al. [6] analyzed that the challenges that governments and regulatory authorities need to solve were to maintain the expansion trend of ORH industry and redefine its relevant legal terms. Posen [39] proposed that the government should accept the market competition brought by the new industry and improve the regulatory safety of the ORH industry through experimental regulatory measures.

D. SUMMARY
A lot of research achievements have been produced on the ORH industry, but there are still some deficiencies in the relevant literatures for this complex business field composed of platforms, passengers, drivers and governments.
Although many scholars have studied the regulatory policies and measures of the ORH industry in different national conditions, they mainly conduct qualitative discussions through questionnaire survey and empirical analysis, and lack objective description and quantitative research through mathematical models. Secondly, most of the current studies on ORH participants start from the positive effects of the industry, but lack the analysis of decision-making behavior of multiple participants in the case of both positive and negative effects. Finally, the existing researches are mainly analyzed from the single perspective of platforms, users and governments, there is a lack of multi-dimensional regulatory research on the ORH industry. The research on the ORH supervision should not only consider the sustainability and effectiveness of the mechanism, but also consider the behavioral factors of multiple participants.
In addition, there is a literature confirming the feasibility of applying evolutionary game theory to the problem of ORH supervision [40], and the literature also took the ORH platform, drivers, and passengers as the main players. However, there are still some differences between this paper and the literature. First of all, the literature still took the government's reward and punishment measures as the main strategic factors in the evolution process, so that it was impossible to clearly distinguish the role played by the platform's own supervision and the government's external supervision when achieving the final stable state. This paper focuses on how to implement an active ORH supervision mechanism without the influence of government intervention. Secondly, the simulation experiments in the literature mainly studied the influence of the different initial proportions of each group on the strategy evolution path. The simulation experiments in this paper focuses on the impact of changes in important parameters on the evolution path of each group strategy. Finally, there are differences in the setting of group strategies and model parameters between the literature and this paper.
In this paper, the government which is commonly used in regulatory game decision-making is replaced by the passenger group, so as to form an active supervision model based on the ORH service and participants themselves. In addition, the game participants of traffic problems are added from traditional two groups to three groups, so an evolutionary game supervision model of ''the platform -drivers -passengers'' is constructed. The evolutionary stability strategy and ideal state of each game group are studied through model derivation and numerical simulation, so as to realize the benign and effective supervision of ORH.

III. CONSTRUCTION OF EVOLUTIONARY GAME MODEL A. MODEL HYPOTHESES
Based on the analysis of the operation situation and participants' behavior of the ORH industry, this paper makes the following three hypotheses: 1) In this supervision model, there are only three game groups: the ORH platform, drivers, and passengers.
2) The game groups have bounded rationality, and they cannot make strategies that maximize their own benefits.
3) Passengers and drivers have no way of knowing whether the ORH platform is under supervision.
The reason for setting the first hypothesis is because these three game groups are the most suitable for the theme of this paper to study the active supervision strategies of ORH through the tripartite evolutionary game.
The reason for setting the second hypothesis is that in the actual decision-making process, people cannot obtain all the information related to the decision. Therefore, people cannot rationally make a strategy to maximize benefits from a global perspective. But people have the ability to learn and imitate, and they can constantly adjust strategies to make them better through changes in the situation, thus forming the evolution process of strategies.
The reason for setting the third hypothesis is because if passengers and drivers can clearly know whether the ORH platform has selected supervision, it cannot reflect the uncertainty of the decision-making process and it is difficult to reflect some key factors that affect the supervision strategies.
The model parameters and definitions of ORH supervision are shown in Table 1.

B. ANALYSIS OF STRATEGIES
In the tripartite evolutionary game model of ORH supervision, the platform has two alternative strategies: supervision and non-supervision. On the one hand, the platform can choose to implement a supervision mechanism to punish drivers who violate regulations and reward passengers who report violations, so that the platform and passengers maintain a mutually beneficial cooperative relationship [41] and produce positive social effects. On the other hand, the platform can choose to ignore violations to reduce the corresponding management costs. However, in the long run, some negative social effects will be generated. Such as the vicious competition among drivers will increase, passengers' safety and property will be threatened, and the order of the passenger transport market will be disrupted.
Drivers have two alternative strategies: violation and nonviolation. On the one hand, drivers can follow regulations of the ORH platform to create a good market environment. On the other hand, drivers may violate regulations for illicit incomes, thereby increasing the risk of being reported by passengers. In order to avoid reputation damage and platform penalties, drivers who violate regulations would choose to pay the corresponding costs to make passengers give up reporting.
Passengers have two alternative strategies: reporting and non-reporting. In one situation, passengers can cost to report violations and get rewards [42], or passengers choose to indulge drivers' violations and get compromise gains. However, no matter which choice passengers make, drivers' violations will cause passengers' losses, including the losses of money and time. The other situation is passengers falsely report when drivers do not break the rules. In this situation, if the platform carries out supervision, the false information will be ascertained and drivers will not be punished, passengers will also do not be held accountable because of the service principle. If the platform does not implement a regulatory mechanism, the improper comments or extremely low scores posted by passengers on the ORH platform will cause reputation damage to drivers.

C. EXPECTED BENEFIT AND PAYOFF MATRIX
The tripartite payoff matrix of the ORH supervision is shown in Table 2. a i , b i , and c i represent payoff decisions of passengers, drivers, and the platform respectively.
In the initial stage of the three game groups, suppose that the proportion of passengers selecting reporting is x, then the proportion selecting non-reporting is 1 − x. Suppose also that the proportion of drivers selecting violation is y, the proportion selecting non-violation is 1−y. The proportion of the platform selecting supervision is z, and the proportion selecting non-supervision is 1−z. Obviously, 0 < x < 1, 0 < y < 1, 0 < z < 1. Because the payoff of each game group will be affected by the strategies of the other two game groups, there are eight combinations of strategies for passengers, drivers, and the platform: (reporting, violation, supervision), (reporting, nonviolation, supervision), (non-reporting, violation, supervision), (non-reporting, non-violation, supervision), (reporting, violation, non-supervision), (reporting, non-violation, nonsupervision), (non-reporting, violation, non-supervision), (non-reporting, non-violation, non-supervision). The payoff decision of each combination is shown as follows: Suppose that U A1 represents the expected benefit of passengers that adopt reporting, U A2 represents the expected benefit of passengers that adopt non-reporting, and U A represents the average expected benefit of passengers, as shown in (9) - (11). (11) Suppose that U B1 and U B2 respectively represent the expected benefit for the strategies ''violation'' and ''non-violation'' of drivers, and U B indicates the average expected benefit of drivers, as shown in (12) - (14).
Similarly, U C1 and U C2 are the expected benefits of the platform employing the strategies of ''supervision'' and ''non-supervision''. U c indicates the average expected benefit of the platform, as shown in (15) - (17).

IV. EQUILIBRIUM ANALYSIS OF THE EVOLUTIONARY GAME MODEL A. REPLICATOR DYNAMICS
The replicator dynamics equation of ORH passengers is shown in (18).
The replicator dynamics equation of ORH drivers is shown in (19).
The replicator dynamics equation of the ORH platform is shown in (20).
In order to get the equilibrium solution of the tripartite evolutionary game under the ORH supervision, simultaneous replicator dynamics equation set is required as shown in (21).
In (21) , the equilibrium solution domain of the evolutionary game [23]. In addition, E(x * , y * , z * ) is also in this domain, which can be obtained by solving (22). Because the solution of E(x * , y * , z * ) is complex, numerical simulation can be carried out based on practical meanings and given conditions.
The derivative of the replicator dynamics equation of each game group can be obtained as follows: According to the evolutionary game theory [43], the equilibrium point is substituted into (23) - (25). If F (x) < 0, G (y) < 0, H (z) < 0, the strategy represented by this equilibrium point is the evolutionary stability strategy (ESS) of the ORH supervision. Based on this, the following paragraphs respectively analyze the ESS of each game group.

B. EVOLUTIONARY STABILITY STRATEGIES
For the passenger group, ESS can be inferred from its replicator dynamics equation (18) and other two groups' replicator dynamics equations (19) (20). 1) When z = C 3 +yp 2 (Q−q) , F(x) ≡ 0 can be got, all levels are in the stable state. The stability of x depends on the initial state.
2) When z = C 3 +yp 2 (Q−q) yp 1 q , x = 0 and x = 1 are the two solutions of F(x) = 0, namely the two stability states of x. Therefore, in order to obtain passengers' equilibrium strategy, dF(x) dx < 0 should be satisfied, as shown in the following analysis.
a. When C 3 > p 1 q − p 2 (Q − q), under the constraints of 0 < y < 1 and 0 < z < 1, yz(p 1 q) − C 3 − yp 2 (Q − q) < 0 can be got. Therefore, F (0) < 0, and x = 0 is the ESS. It indicates that when drivers violate rules, and reporting costs to the platform is greater than the difference between reporting rewards and compromise gains, then not reporting is the ESS of passengers. b. WhenC 3 < p 1 q − p 2 (Q − q), there are the following two situations.
When 0 < z < C 3 +yp 2 (Q−q) yp 1 q , F (0) < 0 can be got, so x = 0 is the ESS. In other words, when drivers' violations are discovered, most passengers are willing to compromise.
When1 > z > C 3 +yp 2 (Q−q) yp 1 q , F (1) < 0 can be got, so x = 1 is the ESS. In other words, passengers tend to choose to report drivers' violations.
Given that the other two groups remain invariant, the replicator dynamics phase diagram of passengers is shown in Figure 1. In Figure 1(b), the arrows point to the direction x = 0, it means that the passenger group tends to select non-reporting when C 3 > p 1 q − p 2 (Q − q). In Figure 1(c), the blue arrow points to the direction x = 0, it means that the passenger group tends to select non-reporting when 0 < z < C 3 +yp 2 (Q−q) yp 1 q . In Figure 1(c), the gray arrow points to the direction x = 1, it means that the passenger group tends to select reporting when 1 > z > C 3 +yp 2 (Q−q) For the driver group, ESS can be inferred from its replicator dynamics equation (19) and other two groups' replicator dynamics equations (18) (20).
, G(y) ≡ 0 can be got, all levels are in the stable state. The stability of y depends on the initial state. 2 , y = 0 and y = 1 are the two solutions of G(y) = 0, namely the two stability states of y. Therefore, in order to obtain drivers' equilibrium strategy, dG(y) dy < 0 should be satisfied, as shown in the following analysis.
Obviously, the total penalties p 3 q paid by drivers to the platform, and the losses L 2 caused to drivers by passengers' false reports are both greater than zero, that is, −x(p 3 q + L 2 ) < 0.
a. When x[p 2 (Q − q) + L 2 ] − p 2 (Q − q) < 0, G (0) < 0 can be got. Therefore, y = 0 is the ESS. It indicates that when compromise costs paid by drivers to passengers for concealing violations are greater than losses caused by VOLUME 8, 2020 false reporting, drivers are more willing to choose the nonviolation strategy. b. When x[p 2 (Q − q) + L 2 ] − p 2 (Q − q) > 0, there are the following two situations.
, G (0) < 0 can be got. So y = 0 is the ESS, and the driver group can eventually evolve to comply with regulations.
, G (1) < 0 can be got. So y = 1 is the ESS, and the ORH drivers can eventually evolve into violations.
Given that the other two groups remain invariant, the replicator dynamics phase diagram of drivers is shown in Figure 2. In Figure 2(b), the arrows point to the direction y = 0, it means that the driver group tends to select non-violation when x < x = p 2 (Q−q) p 2 (Q−q)+L 2 . In Figure 2(c), the blue arrow points to the direction y = 0, it means that the driver group tends to select non-violation when 1 > z > . In Figure 2(c), the gray arrow points to the direction y = 1, it means that the driver group tends to select violation when 0 . For the platform group, ESS can be inferred from its replicator dynamics equation (20) and other two groups' replicator dynamics equations (18) (19). 1) When y = x(−p 1 q+p 3 q+S 1 )−S 1 +S 2 , H (z) ≡ 0 can be got, all levels are in the stable state. The stability of z depends on the initial state.
2) When y = x(−p 1 q+p 3 q+S 1 )−S 1 +S 2 , z = 0 and z = 1 are the two solutions of H (z) = 0, namely the two stability states of z. Therefore, in order to obtain equilibrium strategy of the platform, dH (z) dz < 0 should be satisfied, as shown in the following analysis.
a. When C 1 − S 1 > −p 1 q + p 3 q + S 2 , under the constraints of 0 < x < 1 and 0 < y < 1, xy(−p 1 q + p 3 q + S 1 ) − y(S 1 − S 2 ) − c 1 + S 1 < 0 can be proved. Therefore, H (0) < 0 can be got, so z = 0 is the ESS. This indicates that when total costs of the platform supervision are greater than the difference between the penalty revenue by drivers and the reward expenditure for passengers, as well as the negative social effect, the platform will prefer the non-supervision strategy.
b. When 0 < C 1 − S 1 < −p 1 q + p 3 q + S 2 , there are two situations. It should be noted that when drivers violate regulations, the negative effect of the platform deregulation S 2 will be greater than the positive effect of the platform regulation S 1 . This is because the damage on passengers' property and personal safety cannot be totally remedied, and it could even cause the public to lose confidence in the ORH industry.
When 0 < y < x(−p 1 q+p 3 q+S 1 )−S 1 +S 2 , H (0) < 0 can be got. So z = 0 is the ESS, the platform will be more willing to opt out of regulation.
When 1 > y > x(−p 1 q+p 3 q+S 1 )−S 1 +S 2 , H (1) < 0 can be got. So z = 1 is the ESS, the platform is more willing to implement the supervision strategy.  Given that the other two groups remain invariant, the replicator dynamics phase diagram of the platform is shown in Figure 3. In Figure 3(b), the arrows point to the direction z = 0, it means that the platform group tends to select non-supervision when C 1 − S 1 > −p 1 q + p 3 q + S 2 . In Figure 3(c), the gray arrow points to the direction z = 0, it means that the platform group tends to select non-supervision when 0 < y < In Figure 3(c), the blue arrow points to the direction z = 1, it means that the platform group tends to select supervision when 1 > y >

C. ANALYSIS OF MODEL STRATEGIES
From the evolutionary stability strategies of the three game groups, it can be seen that the proportion of reporting x, the proportion of violation y, and the proportion of supervision z influence and restrict each other along with the evolution process. At the same time, the stability of the equilibrium state is easily disturbed by the decision proportion of each game group, and it is difficult to promote the evolution of the three groups to the anticipated state only by adjusting initial conditions. Therefore, participants' decisions can be guided to the ideal directions by adjusting relevant parameters. The optimal evolution directions of passengers, drivers, and the platform are truly reporting, legally driving, and actively supervising, respectively. when yp 1 q , ORH passengers tend to report violations. With the increase of p 1 q, the greater the denominator is, the more likely the inequality is to be established, which is conducive to the evolution of the passenger group to the positive decisionmaking. Therefore, the platform should increase the rewards for reporting true information, and encourage passengers to actively report violations. In addition, the platform should try to reduce reporting costs of passengers and improve the feedback efficiency of reporting results. For example, the platform could establish a special complaint channel for solving problems at any time to ensure the efficient and convenient service. And specific handling plans could set up for different levels of reports, and reports involved the personal safety of passengers should be listed as the most urgent cases. When , ORH drivers will be more willing to follow the platform rules. Therefore, it is necessary to increase the probability of investigation and punishment when drivers violate regulations, and protect the legitimate rights and interests of drivers as much as possible. At the same time, the platform should increase violation penalties. The penalties should include not only higher fines and the longer time limit for driving bans, but also cancellation of welfare benefits and implementation of criminal penalties for serious cases. What is more, it is necessary to reduce regulatory costs of the platform and give full play to the positive social effects brought by supervision.
when C 1 + p 1 q < p 3 q + S 1 + S 2 and 1 > y > x(−p 1 q+p 3 q+S 1 )−S 1 +S 2 , the ORH platform will lean towards the regulatory strategy. And as p 3 q goes up, the probability of this inequality being true increases. This once again demonstrates the importance of the platform to strengthen the punishment of drivers who violate rules. VOLUME 8, 2020

D. STABILITY ANALYSIS OF EQUILIBRIUM POINTS
By separately analyzing the stable states of passengers, drivers and the platform, it is possible to find the methods to make each group tend to the desired strategy. On this basis, this paper further analyzes the prerequisites for the three groups to be all in the stable state through the equilibrium points and the Jacobian matrix. According to the concept of evolutionary equilibrium proposed by Hirshleifer [44], in a dynamic system, the trajectory starting from any adjacent field of an equilibrium point eventually evolves toward the equilibrium point, then the equilibrium point is the ESS. At the same time, because the solution of asymptotic stability must be a strict Nash equilibrium solution [45], [46], this paper only considers the asymptotic stability of the pure strategy equilibrium points (E 1 -E 8 ). By solving the partial derivatives of x, y, and z for the replicator dynamics equations of the three game groups, the Jacobian matrix can be obtained as shown in (26).
The parts of (26) are shown as follows: The eigenvalues corresponding to each equilibrium point can be obtained by solving the Jacobian matrix [47], and then the asymptotic stability of each equilibrium point is analyzed, as shown in Table 3. From Table 3, it can be seen that only E 1 (0, 0, 0) and E 4 (0, 0, 1) have the possibility to become the ESS, the phase diagram of these two equilibrium points is shown in Figure 4. The remaining 6 equilibrium points are saddle points. The strategies represented by E 1 (0, 0, 0) are that passengers do not report, drivers do not violate regulations, and the platform does not supervise. The prerequisite for these three groups to achieve long-term stability is −C 1 + S 1 < 0, that is, the social positive effects generated when the ORH platform implements supervision are less than the supervision costs. This prerequisite does not match the actual situation and this stability is obviously not conducive to the sustainable development of the ORH industry.
The prerequisite for E 4 (0, 0, 1) to become the ESS is C 1 − S 1 < 0, that is, the social positive effects generated when the ORH platform implements supervision are greater than the supervision costs. This prerequisite can prompt the ORH platform to implement supervision, and eventually enable the driver to evolve into non-violation and passengers to evolve into non-reporting. The equilibrium point is the final stable state expected by this paper, and the prerequisite for the ESS shows the costs and benefits of the ORH platform when passengers have no reports and drivers have no violations.
In addition, E 6 (1, 0, 1) is the expected decision state of this paper at the beginning of the evolution process. Through eigenvalues analysis, it can be seen that passengers report, no driver violation and platform supervision cannot achieve long-term stability. When drivers do not violate the regulations, the platform can identify the false reports of the passengers through supervision, thereby gradually reducing the passenger's report rate, and eventually evolves into a state where the passengers do not report and the drivers have no violations under the platform supervision.

V. NUMERICAL SIMULATION OF ORH SUPERVISION A. NUMERICAL SIMULATION OF THE INITIAL STATE
In order to verify analysis results of the supervision evolutionary game model, the evolutionary paths of passengers, drivers, and the platform can be simulated numerically. Based on the demand of simulation, the replicator dynamics equation of each game group is discretized to analyze the asymptotically stable running trajectory of the evolutionary game. Let the time step be t, then (36) - (38) can be obtained from the definition of derivative.
In accordance with (36) - (38), the influence of related parameters on the evolutionary game can be further studied by using Wolfram Mathematica 9. In order for the three game groups to finally achieve ideal states, the initial parameter setting should meet C 3 < p 1 q − p 2 (Q − q), C 1 + p 1 q < p 3 q+S 1 +S 2 , C 1 < S 1 and S 1 < S 2 . The initial parameters are set as follows: p 1 q = 4.5,p 2 (Q − q) = 2.5, p 3 q = 6.5,C 1 = 5, C 3 = 0.4, L 2 = 1, S 1 = 5.5,S 2 = 6. Set all the initial proportions of passengers, drivers, and the platform as 0.5, so as to objectively evaluate the evolution path of the ORH supervision game from the neutral starting point. The initial evolution path is shown in Figure 5. As can be seen from this figure, under the dual influences of the platform's supervision and drivers' compliance, passengers gradually evolve into the strategy of not reporting. The stable strategy of drivers reaching no violation around 2 periods means that there is no illegal behavior in the ORH market in the following periods. However, it takes about 12 periods for passengers to reach the stable strategy of not reporting, indicating that passengers are still actively participating in the supervision within the 2-12 periods. At the same time, passengers gradually evolve into the strategy of not reporting as the number of wrong complaints reminded by the platform increases. This evolutionary path is in line with the ideal decision-making of reporting by passengers, non-violation by drivers, and supervision by the platform. Finally, the best state is realized, which is that drivers do not violate regulations and passengers do not complain under the supervision of the platform.

B. NUMERICAL SIMULATION OF PARAMETER VARIATION
Among all the parameters, there are three parameters that affect the certain two groups in the game: passengers' rewards received from the platform for reporting violations p 1 q, drivers' penalties received from the platform for violations p 3 q, and passengers' compromise gains received from drivers for not reporting violations p 2 (Q − q). Therefore, the influences of these three parameters on the decision-making of participants will be studied respectively in the following paragraphs.
For the analysis of these three parameters, this paper sets the initial proportion of passengers selecting reporting to be 0.5, that is, the passenger group is initially neutral. This paper sets the initial proportion of drivers selecting violation to be 0.9, that is, the driver group initially prefers to violate the regulations to obtain illicit incomes. This paper sets the initial proportion of the platform selecting supervision to be 0.1, that is, the platform initially prefers to be unsupervised to reduce the corresponding costs.
Keeping other parameters and the initial proportions of the three game groups unchanged, increase the value of p 1 q to 6 and decrease it to 3. Because this parameter is not included in the replicator dynamics equation of drivers, it will not change the drivers' evolution path. The evolution paths of passengers and the platform are mainly observed as shown in Figure 6. As can be seen from Figure 6., when rewards received from the platform are increased, passengers will be encouraged to report violations more actively, but at the same time, the proportion of passengers reporting false information for getting more rewards will be increased. The decrease of reporting rewards will accelerate the strategy evolution of not reporting for passengers, but it will also reduce the frequency of false reporting.
For the ORH platform, improving reporting rewards means the increase of regulatory costs, which will lead to the slower evolution of the platform towards the supervision strategy. On the contrary, the reduction of incentive costs will make the platform more incline to choose the supervision strategy.
In addition, when increasing p 1 q, the evolution path of passengers will have a smooth and slight upward process, but the evolution path of the platform does not fluctuate significantly, which means that the change of p 1 q has a greater impact on passengers' decision than the platform's decision. So, the platform should carefully weigh the relationship between passenger incentive and cost acceptability when setting the reward amount.
Keeping other parameters and the initial proportions of the three game groups unchanged, increase the value of p 3 q to 10 and decrease it to 5. Because this parameter is not included in the replicator dynamics equation of passengers, it will not change the passengers' evolution path. The evolution paths of drivers and the platform are mainly observed as shown in Figure 7. As can be seen from Figure 7., when the platform reduces the punishment for violations, it will slow down the evolution of drivers to the non-violation strategy, because the reduction of risk costs will increase the fluke mind of drivers to avoid punishment. Therefore, the platform should severely punish the drivers who violate rules, which can not only restrict illegal behaviors of drivers but also offset partial supervision costs of the platform and promote the evolution of the two game groups to the ideal decision-making.
Keeping other parameters and the initial proportions of the three game groups unchanged, increase the value of p 2 (Q − q) to 4 and decrease it to 1. Because this parameter is not included in the replicator dynamics equation of the platform, it will not change the platform' evolution path. The evolution paths of drivers and passengers are mainly analyzed as shown in Figure 8. As can be seen from Figure 8., When the offending drivers' compromise gains for passengers decreases, passengers tend to choose the reporting strategy, and the total time to participate in supervision is extended by two time periods compared to Figure 6. and Figure 7. At the same time, compared with other parameter changes, the reduction of compromise gains is the only parameter change that makes the reporting proportion of passengers higher than 0.5 in the early stage. Therefore, the reporting proportion of passengers is significantly negatively correlated with compromise costs paid by drivers. When compromise gains increase, although passengers are more likely to cover up drivers' misconduct, the corresponding higher violation costs will cause drivers to accelerate the evolution to the non-violation strategy.

VI. CONCLUSION
The ORH industry plays a crucial role in developing sharing economy, improving resource allocation and enriching travel modes. However, the violations of some ORH drivers have disrupted the healthy operation of the market and the normal order of the society. Because most of the violations occurred in the presence of passengers, the construction of the active supervision based on the platform, passengers and drivers can reduce the occurrence of problems at the source through mutual restriction and cooperation.

A. RESEARCH CONCLUSIONS
On the premise of the bounded rationality of the decisionmaking participants, this paper constructs the tripartite evolutionary game model on the basis of considering the benefits of the ORH platform, drivers and passengers. Through analyzing the evolutionary stability strategy of each game group and the eigenvalues of each equilibrium point, it shows that the effectiveness and sustainability of the ORH supervision are largely determined by the reward amount and the reporting cost of passengers, the supervision cost and the social benefit of the platform, the punishment degree and the violation risk of drivers.
At the same time, in order to further study the evolutionary path of the stable state, this paper analyzes how different parameters change the evolution behaviors of the three game groups by numerical simulation, and draws the following two conclusions.
First, no matter how the relevant parameters change, passengers will ultimately evolve a non-reporting strategy. Although improving reporting rewards can enhance the continuity of passengers' participation, it is difficult to really promote the huge increase of passengers' enthusiasm, which shows that the platform is still the core pillar of ORH supervision, and the passenger group can only play a supporting role.
Second, when both drivers and the platform reach the ideal stable states, passengers still have a period of decision evolution. The reason why passengers still choose the reporting strategy when drivers do not violate the regulations may be that some passengers who want to get the rewards misjudge the drivers' compliance, and the duration of this situation is positively correlated with the amount of reporting rewards. Therefore, the platform should take the cost of detecting false information into consideration when it designs reporting rewards.

B. IMPROVEMENT SUGGESTIONS
As the implementer of ORH supervision, the platform should ensure the stable development of the internet passenger transport industry from the inside out. Therefore, the following two suggestions are proposed based on the research results of this paper.
On the one hand, the platform should increase the rewards for true reports, such as offering large discounts on fares and freely upgrading to high-grade cars. Punishments for drivers who violate regulations should be strengthened comprehensively. The platform should set different levels of fines and directly dismiss seriously violators, so that they cannot earn incomes and enjoy travel services through the platform.
On the other hand, the platform should build good user relationships, and the users referred to here are not only passengers, but also drivers. While strengthening the management of service terminals and maintaining long-term good passenger experiences, the platform should also safeguard the legal rights of drivers and guarantee their information security. The platform can establish the special complaint mechanism for drivers and passengers respectively, improve the whole complaint channel on the basis of reducing the cost of user complaints, so that each complaint can be solved quickly and fairly. Protect the information and personal safety of users from the source.

C. FUTURE RESEARCH DIRECTIONS
Because behaviors and decisions of participants will change in the complex evolution process of ORH services, it is necessary to further consider the psychological variables such as the willingness of passengers, the subjective norms of drivers and the perceived value of the platform in future studies. At the same time, most of the current game studies on the ORH supervision only include the government and the platform. In the future, third-party regulatory agencies or associations of ORH enterprises can be included in the regulatory research, so as to further analyze the game strategies of the ORH industry.