Effect of Optimization Time-Scale on Learning-Based Cooperative Merging Control at a Nonsignalized Intersection

Automated driving and the widespread use of large-scale communication infrastructure are expected to facilitate highly cooperative driving. Although considerable research has focused on developing efficient cooperative control methods for nonsignalized intersections, the effect of cooperative control for conflicting target vehicles on future traffic flow is yet to be investigated. Therefore, we aim to investigate whether the impact of such cooperative control on future traffic should be considered. We established a traffic simulator and several machine-learning methods to select the optimal cooperative method. The decision tree and deep neural network were trained on two indices that evaluate short-term/long-term predictive control: to minimize the travel time of the 1) conflicting vehicles and 2) all vehicles including future traffic flow. Simulation analysis results indicated that there were no significant differences in the total travel times between these indices. This finding indicates that efficient traffic flow, which includes future traffic flow, is achievable by short-term cooperative control methods that can be established easily.


I. INTRODUCTION
In recent years, the development of automated driving technology and communication infrastructure with low latency and high capacity has progressed rapidly. These technological advancements are expected to help the realization of connected autonomous vehicles (CAVs), which are expected to play a leading role in the next-generation transportation systems. The CAVs can acquire and share highly accurate and wide-area traffic information in real time through vehicle-tovehicle and vehicle-to-infrastructure communication. Therefore, unlike human-driven vehicles, CAVs are expected to The associate editor coordinating the review of this manuscript and approving it for publication was Jie Gao . perform cooperative driving that consider the location of other vehicles in the road network.
Recent studies expanded these methods to calculate optimal vehicle control solutions using mathematical methods for further improving the efficiency of traffic flow. A common method is to mathematically model the behavior of individual vehicles and formulate optimization problems to minimize travel time and/or avoid traffic problems. Although this method has been used in many studies, it has two limitations. One is the computation time required to obtain the optimal solution. It is not easy to mathematically solve the problem of optimizing complex traffic maneuvers with high traffic volumes, and many studies have relied on numerical solutions that require considerable computation time. Some researchers have approached this problem using model predictive control (MPC), which can solve optimization problems in response to changing traffic conditions in real time [12], [13], [14], [15], [16], [17]. Furthermore, it is difficult to formulate and solve optimization problems that consider the long-term impact of vehicle control on the overall traffic flow. The mathematical modeling of traffic flow can describe only the surrounding vehicles at intersections when a conflict occurs because of the computational load.
Thus, most existing studies have focused only on the shortterm optimization of conflicting and surrounding vehicles. To the best of the authors' knowledge, there are no studies that have discussed the differences in the effectiveness of long-term and short-term optimizations.
In this paper, we discuss this difference in the optimization scale by using a learning-based approach that refers to a simulation-based approach proposed by Tashiro et al. [24], which considers the long-term effects with a small computational load.
The remainder of this paper is organized as follows: In Section II, the literature on cooperative intersection management is reviewed, and the research hypothesis of the current study is stated. In Section III, we explain the learning-based cooperative merging control method and traffic simulator utilized in this study. Section IV discusses the differences between the two optimization scales in cooperative merging control. Finally, Section V concludes the paper and discusses future work.

II. LITERATURE REVIEW AND PROBLEM STATEMENT
Research on cooperative intersection control can be divided into two categories: proposals for new merging control approaches and proposals for solutions with improved efficiency using existing approaches. Previous studies [1], [2], [3], [4] have reported methods that consider the vehicles on multiple roads as virtual platoons based on the distance from an intersection. Uno et al. [1] proposed a method for placing virtual vehicles in different lanes in an expressway merging area; the vehicles in each lane maintain the same distance from each other and change lanes as the real vehicles. Li and Wang, and Xu et al. [2], [3] considered the vehicles on roads connected to an intersection as virtual platoons. A group of vehicles that enter an intersection simultaneously is defined as a generation within the virtual platoon; the platoon is adjusted such that no vehicles in the same generation overlap within the intersection.
In the reservation-based approach [5], [6], [7], [8], [9], the intersections are represented by cells, and each vehicle communicates with the central controller to reserve a cell location for the time it plans to pass. The central controller manages the reservation such that no two vehicles are in the same cell simultaneously. Dresner and Stone [5] determined the priority of vehicles using only their distance from the intersection (first-come, first-served). In other studies [6], [7], [8], [9], the authors proposed a numerical method for determining the priority of each vehicle that minimizes the travel time to achieve more efficient reservation-based intersection control.
Furthermore, mathematical modeling approaches have been studied by formulating optimization problems [10], [11], [12], [13], [14], [15], [16]. The optimization problem is formulated to minimize vehicle travel time and vehicle crossing time within an intersection that uses vehicle positions and speeds to numerically solve under constraints such as collision avoidance and speed limits. Lee and Park [10] solved the optimization problem for a single intersection using a genetic algorithm; They showed that the vehicle stopping time and fuel consumption were reduced. Zhang et al. [11] formulated an optimization problem for multiple intersections. Other studies [12], [13], [14], [15], [16], [17] have proposed vehicle control methods using MPC, where the MPC-based intersection control predicts the traffic condition of the next timestep based on the acceleration/deceleration control given to the vehicle and solves the optimization problem for obtaining the optimal vehicle control method for each timestep. The formulation of the optimization problem was further used for improving efficiency in the two rule-based approaches previously discussed [1], [2], [3], [4], [5], [6], [7], [8], [9]; however, this approach optimizes vehicle behavior more directly.
Meanwhile, mathematical modeling approaches include studies similar to optimization in reservation-based approach. The scheduling problem for vehicles entering an intersection was formulated as a mixed-integer linear program (MILP) [18], [19], [20]. Fayazi et al. [18] solved the formulated MILP numerically to determine the intersection access time for vehicles, which significantly reduced the number of vehicle stops and delays. Ashtiani et al. [19] applied this method to multiple intersections. Fayazi and Vahidi [20] analyzed the effect of vehicle control with MILP in a closed, real-world experimental field.
Other studies [21], [22], [23] applied a machine-learning approach for cooperative control at an intersection. Wu et al. and Guo et al. [21], [22] employed reinforcement learning using a Q-learning-based model for the cooperative control of each vehicle. Similarly, [23] investigated the applicability of deep Q-networks for cooperative control management at intersections. These studies focused on vehicles in control zones near intersections and optimized the travel of these vehicles from the time they enter the control zone until they exit.
Although various cooperative merging control methods have been proposed, most of them target only the controlled vehicles in the communication range of the intersection controller. These studies evaluate controlled vehicles based on the travel time or fuel consumption of those vehicles when passing through the intersection. It is difficult to consider vehicles located far from the intersection (e.g., vehicles entering a communication area after several minutes) in their optimization methods because of the enormous computation time requirement. Past studies have not evaluated the impact of vehicle control on future traffic, and there is a lack of discussion regarding their necessity.
This study investigated the difference between the following two methods for cooperative merging control by focusing on travel time reduction: (1) Short-term optimization, which is used in previous studies and focuses on optimizing the travel time of conflicting vehicles in the control zone.
(2) Long-term optimization that evaluates future traffic, which includes vehicles entering the intersection in the future.
Long-term optimization is hypothesized to be better than short-term optimization; however, if their performances are similar, it is sufficient to only consider vehicles in the control zone and/or conflicting vehicles, while ignoring the influence of cooperative control on the upstream future traffic flow. The contribution of this study lies in clarifying this aspect.

III. METHODOLOGY
A micro traffic-flow simulator based on the NaSch model [25] [26]-a cellular automaton-was used to implement and evaluate the efficiency of the cooperative merging control.

A. MICRO TRAFFIC SIMULATOR
The cell size was set to 2.5 m with a time step of 1 s. At each time step, the next position and speed were determined based on the current speeds and positions of the target and forwarding vehicles; these values were updated sequentially. The average length of a standard car (4-5 m) and the minimum vehicle gap (2.5 m) were assumed; the vehicles were placed in three cells (the two cells in front were occupied by vehicles and the cell following them was the minimum inter-vehicle distance).
The speed change of each vehicle follows the if-then rule shown below.
where v t represents the speed, gap t represents the distance to the forwarding vehicle on the travel lane (forward gap), and v max represents the maximum speed on the link on which it is traveling for a vehicle at time step t. Thus, the vehicle travels at a maximum speed allowable without collision with the forwarding vehicle. Vehicles turning at intersections slow down to 9 km/h (1 cell/s) as they enter the intersection. Another important rule for vehicle maneuvers is lane changing. Vehicles cannot change their lanes unless there is a sufficient gap between the target vehicle and those front/behind in the adjacent lane. Since as vehicles turning at an intersection ought to be in the appropriate lane, the gap requirement is relaxed inversely according to the distance of the vehicle from the intersection. This rule is based on that proposed Nagel et al. [26] and its details are presented in the appendix.

1) ROAD NETWORK FOR SIMULATION
The road network used in this study is illustrated in Fig. 1. All links are 500 m long (200 cells); lateral links have two lanes per side; and the longitudinal link has one lane per side. The maximum speed for each lateral link is 54 km/h (6 cells/s) and 36 km/h (4 cells/s) for the longitudinal links. That is, the network assumes a nonsignalized intersection, in which a slow longitudinal branch line is connected to a fast lateral mainline. We define nonsignalized intersection as a small intersection with no traffic signal and can be controlled by a STOP sign.
Furthermore, Fig. 1 shows the paths of the vehicles at the intersection, and the dots represent the conflicting points.

2) PRIORITY CONTROL
Priority/nonpriority control (referred to as STOP sign control) is considered a control option to compare the merging control methods. In priority/nonpriority control, the lateral main line is set as the priority road (link), and the longitudinal line is set as the nonpriority road (link) with a STOP sign. A vehicle making a left/right turn at an intersection pauses at the end of a link, and then, the vehicle enters the intersection if the arrival time at the intersection is at least 3 s for the vehicle with the highest priority, which overlaps in the intersection.

B. COOPERATIVE MERGING CONTROL
All vehicles are automated vehicles equipped with V2X functions; there is no uncertainty in vehicle speed management; the communication environment is fast; and there is no delay. We adopted cooperative control methods proposed by Medina et al. [19] and established the training dataset to build an optimal control selection model with machine learning. In this methodology, a conflict is defined as a scenario wherein a vehicle is forced to wait before entering an intersection because of the presence of another vehicle with a higher priority. A simulation-based approach searches for the optimal solution, which is a combination of control candidates given to each conflicting vehicle through future forecasting simulations.
In Fig. 2, t is the current time; the traffic scenario at future time, t ′ , would be realized if a solution is predicted through the simulation. t ′ , can be described as t + t f , where t f refers to the optimization time length. t f was set as a small value for short-term optimization and as a large value for long-term optimization. t f varies accordingly to satisfy the future time t ′ set in each optimization methods. The prediction is conducted for all possible control candidates. A combination of control candidates, which is the most effective one, is selected as the optimal solution and applied to the vehicles at the next time step t + 1 by comparing the performance of each solution.
However, this simulation-based approach is computationally time-consuming, because the optimal solution is selected using a brute-force search. We modeled the relationship between the traffic situation at t and the optimal solution using machine learning. Machine learning requires a large computation time and exerts a high computational load during the first time the model is built and optimized. However, the subsequent classification problems can be quickly solved with low computational load. The method for predicting future conflicts, the vehicle control method, and the indices for selecting the optimal solution are explained in Section B.1, before explaining the establishment of machinelearning models in Section C.

1) CONFLICT PREDICTION AND CONTROL METHOD
Every time a vehicle passes the conflict prediction line on a link that flows into an intersection, a conflict is predicted to occur until the vehicle passes the intersection through a simulation of future situations. Two vehicles facing the conflict are recorded as vehicles to be controlled if a conflict is predicted. The conflict prediction line for a priority link is set at 120 m from the intersection and 80 m from the nonpriority link, which considers the results of preliminary testing and the maximum speeds of each link (see Figure 3).
The optimal solution for control is investigated if the conflict between two vehicles is predicted; these vehicles are instructed on how to drive to the intersection. The vehicle control method adopted in this study does not provide detailed commands regarding the momentary acceleration/deceleration changes; instead, it instructs the basic driving method.

2) VEHICLE CONTROL CANDIDATES
An optimal solution to control two conflicting vehicles was developed. This search simulates every pair of vehicle control candidates and evaluates the obtained results.
There are five vehicle control candidates: lane change (L), high-speed driving (A), medium-high speed (M1), mediumlow speed (M2), and low speed (D). In addition, no control (N) that follows the priority/nonpriority rule of the road is treated as a control candidate. A total of 26 control combinations are created by combining these six control candidates ( Table 1). The combination that provides the same control to both controlled vehicles, such as A-A (high speed-high speed), is not implemented, because the difference between the intersection arrival times of the two vehicles does not change and is clearly inefficient for resolving conflicts. However, the N-N combination (following priority/nonpriority control) is not eliminated from the combination. Lane change control is applied only to a vehicle that is cruising on a multilane road (lateral link).
Vehicle control is performed when the vehicle is in the control zone; it is defined as the area from the conflict prediction line to the terminal of the link (priority link: 120 m, nonpriority link: 80 m), as shown in Figure 3.
The control ends when one of the controlled vehicles passes through the intersection. Control is not executed if the given control causes collision with other surrounding vehicles. In addition, lane change control is applied only when it does not cause the vehicle to deviate from the planned route. Table 2 summarizes the speed instructions under the control conditions. The speed ranges from 9 to 36 km/h (1-4 cells/s) on the nonpriority link and from 18 to 54 km/h (2-6 cells/s) on the priority link. In all cases, the vehicle acceleration and deceleration rates were constant at 1 and −1 cell/s 2 . As stated in the priority/nonpriority rule, the vehicle ought to decelerate to a speed of 9 km/h (1 cell/sec) before entering the intersection when a vehicle turns left or right at an intersection. The possibility of a link transition is investigated when the vehicle reaches 7.5 m from the intersection to realize the smooth merging of cooperating vehicles. The vehicle can enter the intersection without stopping if link transition is possible.

3) INDICES FOR SELECTING THE CONTROL CANDIDATE
The first index is the conflicting vehicles' travel-time minimization index (CVT), which aims at short-term optimization. This index evaluates the travel time of only two conflicting vehicles, and it does not consider the effect on the surrounding vehicles and future traffic flow. The index is intuitive, and it is calculated as where t A (n) and t B (n) represent the time from conflict prediction to the intersection exit time of two conflicting vehicles. The second index is the future total time minimization index (FTT), which aims at the long-term optimization of traffic flow. This index evaluates the effect of conflict resolution on future traffic flow as well, and it aims to reduce the total travel time for the entire traffic flow. Therefore, the FTT is defined as where for a control candidate, n, t i (n) represents the travel time of vehicle i until it exits the network and cmax represents the sum of the number of vehicles present in the network at the time of conflict prediction and the number of vehicles that will enter the network subsequently. The control candidate n that minimizes this index is selected as the optimal solution. The future time (or total number of vehicles, cmax) over which the travel time is evaluated should be sufficiently long (or large) to diminish the effects caused by cooperation control; it is set at the end of the simulation, because the investigation of this range is beyond the scope of this study.

C. MACHINE LEARNING
We used decision tree (DT) and deep neural network (DNN), which can perform multiclass classification, to construct a model that selects an optimal solution for solving this problem.

1) TRAFFIC CONDITIONS IN TRAINING DATA
The training data must contain various traffic scenarios for training a model that can manage diverse traffic conditions. The flow rate (FR) (unit: vehicles/10 min) on the priority link and the turn rate (TR) from the priority link to the nonpriority link are controlled as shown in Fig. 4. TR determines the number of vehicles that enter from the nonpriority link to priority links; its volume becomes 2XY vehicles/10 min. In this study, five patterns of FR (80, 90, 100, 110, and 120 vehicles/10 min) and five patterns of TR (10%, 20%, 30%, 40%, and 50%) were set to create various traffic conditions in training data.

2) SUCCESSIVE CONFLICT CONTROL
In a congested traffic flow, the conflict caused by the following vehicles increases, while controlling a pair of conflicting vehicles. In such cases, controlling the conflict vehicles behind may require more specific control because of the unusual behavior of the controlled vehicles ahead. For example, while the vehicle in front is controlled to drive at a low speed, it is impossible for the following vehicle to drive at a high speed unless there is a sufficient gap between these vehicles. Therefore, when cooperative control is implemented successively, it is necessary to add the presence of the controlled vehicle in the front and its control method to the set of explanatory variables when establishing the model. In this study, training data were generated based on the optimal controls found by performing a brute-force search in an environment where there are no other controlled vehicles. These training data are called ''first training data'' in this study. Subsequently, the other training data were generated by performing a brute-force search in a scenario, where there is a pair of controlled vehicles ahead, whose control solution is selected by the model established using the first training data. These second training data are called ''second training data.'' Figure 5 shows the difference in the situations considered between these two sets of training data.

3) MODEL ESTABLISHMENT
Optimal solution selection models were established based on the traffic situation at current time, t; the conflict is detected as the predictor (explanatory variable). Table 3 lists the predictors. The predictor uses five pieces of information as ''vehicle data'': the vehicle's link, lane, cell, speed, travel direction, and grace time, until the predicted conflicts occur. In addition, six types of data were defined as ''traffic condition data'' to represent the traffic conditions in the network. These included the number of vehicles, traffic density, spatial average speed, the percentage of vehicles traveling straight, and the percentages of vehicles turning right and left. The first training data comprised information regarding two vehicles raising the conflict, and the predictors were traffic condition information for each lane of each link in the network, separated by five different distance ranges from the intersection (25 m, 50 m, 100 m, 125 m, and 250 m). This resulted in 2,470 predictors. For the second training data, we added information regarding the conflicting vehicles ahead, the control applied to each vehicle, etc., to the predictors. This makes the total number of predictors 2,485; however, the associated predictor value is set as 0, if there is no vehicle in the control ahead.

A. MODEL TRAINED RESULT
The models were trained and tuned using MATLAB. The models were optimized based on the classification error from k-fold cross-validation to prevent over-training. In this study, each model was trained with k = 5, and the model  with the smallest average classification error was adopted. The DT models built using the classification and regression trees (CART) algorithm, and the DNN models, optimized by the limited-memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) algorithm, constitute four layers with the rectified linear unit as the activation function.

1) CLASSIFICATION ACCURACY
The training results based on the CVT are summarized in Table 4. Focusing on the differences in machine-learning methods, DT showed smaller classification errors compared to those of the DNN. The efficient selection of predictors is important, because the number of predictors used in this study is large (2,485 predictors). In the DT model, the predictors are used in the order of their effect on classification, whereas in the DNN, all predictors are used simultaneously.
DNN showed a lower accuracy because of the presence of many unimportant predictors. In addition, Tables 4 and 5 show that the classification accuracy degrades for the second training data, which implies that the selection of the optimal control method becomes difficult when there are other controlled vehicles ahead. Subsequently, the optimization indices were compared. Table 5 presents the results of DT based on the FTT. Table 5 and the comparison with Table 4 indicates that FTT has a lower accuracy than CVT; this is reasonable because the FTT evaluates the travel time of vehicles that will enter the network in the future. That is, an optimal solution of CVT is based on the traffic situation for a few seconds after the conflict is predicted, whereas that of the FTT must consider vehicles that enter the intersection after the controlled vehicle   has passed through the intersection and is affected by the traffic situation up to several minutes after the conflict is predicted. The predictors were collected from the current scenario. Therefore, the model based on FTT is considered to have a lower prediction accuracy. Figure 6 shows the percentage of each solution corresponding to the 2nd training data and that selected by DT. The cases presented correspond to the FR value of 100 vehicles/10 min, while TR was varied from 10 to 50%.

2) TENDENCY OF OPTIMAL SOLUTION
The comparison of the training data indicated that the change in the optimal solution is similar between the CVT and FTT. In addition, for both CVT and FTT, the higher the TR, the more diverse are the optimal solutions. The complexity of the optimal solutions in the FTT index increases with an increase in TR. The composition of the optimal solution in the FTT was complex, as it was composed of more solutions than that in the CVT.
For example, in CVT training data, the top five control solutions with the highest percentage (''high speed-lane change (A-L),'' ''high speed-without control (A-N),'' ''without control-lane change (N-L),'' ''without control-mediumhigh speed (N-M1),'' and ''without control-without control (N-N)'') accounted for approximately 90% of the total regardless of the TR; in the FTT training data, the percentage of these top five control solutions decreased to approximately 70% for the more congested situation with TR = 50%. Therefore, the complexity of the optimal solution in FTT is the cause of the deterioration of the classification accuracy observed in Tables 4 and 5.
The most frequently and second most frequently applied optimal solutions, which are A-L and N-L, respectively, include lane changes. That is, lane change can solve conflicts most efficiently; however, the composition ratio of these solutions decreases with an increasing turning ratio on the arterial links (TR).
A comparison between the optimal solution in the training data and that selected by DT indicated that DT can predict the basic tendency of change in the optimal solution with changes in the TR. However, the optimal solution with a significantly low application percentage was not reproduced well for either index. Furthermore, the complexity of the composition of solutions in FTT was not reproduced by DT, although the simple composition of solutions in CVT was better reproduced. Furthermore, Fig. 6 shows that we can understand the difficulty in learning the optimal solution based on FTT.

B. RESULT OF COOPERATIVE CONTROL APPLICATION
A verification test was performed for various traffic scenarios to investigate the effect of optimal cooperative control. In the following analysis, the optimal control solution is selected by DT, and the traffic situation is set identical to that in the training data. For each traffic volume pattern, 100 scenarios were simulated with different random seeds for vehicle generation. The effectiveness of the optimal control solution selection by DT was evaluated using where n a,i represents the total number of vehicles; T 0,i and T 1,i represent the average total travel time without and with merging control, respectively; and I represents the total number of scenarios (100 in this study) for simulated traffic situation a with different FR and TR and for scenario i. T 0,i and T 1,i are presented in seconds (s); therefore, R a represents the average travel-time reduction per vehicle (s/vehicle) for each case. The simulation was conducted over a 20 min period, and the vehicles entering the network were evaluated during the 10-min period from 5-15 min after the start of the simulation. The results are presented in Fig. 7. For the highest FR (= 120 vehicle/s) and TR (= 50%), the reduction in the travel time becomes the largest. Particularly, an increase in TR contributes to an increase in the effect of cooperative control, which implies that the more congested and complex the traffic situation is, the greater is the effect of cooperative control.

C. DISCUSSION OF COOPERATIVE CONTROL EFFECT
A comparison among the optimal control selection models indicates that all models show a similar effect in travel-time reduction; however, DTs established based on FTT show a slightly better effect, which implies that the classification error of the established models does not exhibit a large negative impact on cooperative control. That is, travel-time reduction may still be possible even if the solution selected by the model is not the most effective control. A better FTT index should improve the average travel time for all vehicles. However, as previously mentioned, it is considerably difficult to obtain a high classification performance based on the FTT. Therefore, the high cooperative control effect of FTT can be considered to compensate for the deterioration of its classification accuracy. These findings have several important implications. It is significantly difficult to find an optimal cooperative control that considers the effect of current control on future traffic. However, the results presented here show that it is not necessary to consider such future effects, and it is sufficient to consider the travel times of two conflicting vehicles (CVT). An important note is that this finding may depend on our experimental environment. We applied a variety of traffic scenarios, including extremely high turn rates for small intersections and obtained these results. Larger and cross intersections may yield different results and may require further discussion. Finally, we discuss the difference in traveltime reduction between the models established using the 1st and 2nd training data sets. Although the models based on FTT show no significant difference between the two, the models based on CVT show that the 2nd training data set tends to exhibit better performance. Therefore, additional learning allows a more optimal control to be selected for each conflict, particularly when the model is established based on the CVT index.

V. CONCLUSION AND FUTURE WORKS
This study investigated the effects of cooperative control by assuming a future traffic situation with CAVs. A traffic simulator and machine-learning models, DT and deep neural networks, were established. By using these models, the differences between short-term and long-term optimizations of cooperative merging control at a nonsignalized intersection were investigated. The short-term optimization considers the travel time of conflicting vehicles, whereas the latter considers the impact of cooperative merging control on future traffic. Although long-term optimization was expected to be better, the simulation results showed that they have similar effects. This finding indicates that it is not necessary to consider future impacts. It is considerably difficult to find optimal cooperative control that considers the effect of current control on future traffic. Therefore, this finding has important implications for actual traffic control in the CAV society. Further improvements in machine-learning models are expected to allow for a more detailed discussion on future work. For example, vehicle control solutions that are rarely selected, as optimal solutions were not appropriately considered in this study. In addition, it is necessary to investigate parameters used in cooperative control. Control zones and prediction lines were set at 120 m from the intersection to realize the difference in impact by vehicle speed control. Further discussions regarding the locations of these zones that could be applied to shorter roads in the urban area are required. Additionally, the time scales evaluated using the FTT index ought to be investigated. In this study, the optimization time, t f , was assigned the value of the time taken by all vehicles to leave the network. A comparative analysis on various time-scales of optimization time t f are required.
Finally, this study applied cooperative merging control to a network with only a single small intersection. In the simulations conducted for machine learning and validation, traffic scenarios were created with random vehicle generation. However, the actual traffic flow is affected by neighboring intersections; for example, vehicles approaching a target intersection may form a platoon if there is a signal-controlled IF (Cells in the next lane are empty.) IF (Go straight through the intersection) IF (w1 > w2 AND w1 > w3) THENLane change ELSEIF (Turn right/left at intersection) IF (w1 + w4 > w2 AND w1 + w4 > w3) IF (w5 = 0) THEN Lane change ELSEIF (w5 > 0 AND Not in the right/left lane) THEN Lane change to the right/left intersection upstream. Thus, we ought to consider a network with multiple intersections and analyze the disturbance experienced by vehicle following the control vehicle in the platoon.

APPENDIX A LANE CHANGING RULE
Lane change is assumed to be a move to the cell directly adjacent to it, and before updating the speed and moving forward, the lane change is performed using the following algorithm: where gap o,t represents the gap between the vehicle ahead in the subsequent lane, gap b,t represents the gap between the vehicle behind in the next lane, d t represents the number of cells to the intersection ahead, and w1 to w5 represent the weights for determining the lane change. When traveling on a multilane link, the vehicle must be in the far-right/far-left lane at the end of the link if it is planning to turn right or left at the next intersection (when traveling straight ahead, there is no restriction on the lane position). The urgency of changing lanes for a link transition increases as one moves closer to the end of the link, as expressed by w4. However, when a vehicle is in the far-right/left lane for a right/left turn, it is desirable to avoid unnecessary lane changes and stay in that lane as one gets closer to the intersection. This effect is represented by w5. Parameters d * and d * * are given by the number of cells. In this study, d * * = d * = 80 was set such that these effects were exerted approximately 200 m before the intersection.

APPENDIX B TRAINING DATA STATISTICS
To build a machine learning model applicable to a variety of traffic situations, a training dataset was created with 25 traffic situation settings. For each traffic situation, 1000 training data sets were created. Table 6 shows the average number of times each class was selected. The control patterns including medium and low speed driving can be observed to increase in the FTT. In addition, the standard deviations show that the composition ratio of the classes change depending on the traffic conditions. SHINTARO KATAGIRI received the B.E. degree in civil engineering and the M.E. degree in civil and environmental engineering from Nagoya University, Japan, in 2020 and 2022, respectively. His research interests include cooperative control of mixed traffic flow consisting of automated and human-driven vehicles.
TOMIO MIWA received the B.E., M.E., and D.E. degrees in civil engineering from Nagoya University, Japan, in 1998, 2000, and 2005, respectively. He is currently an Associate Professor with the Institute of Materials and Systems for Sustainability, Nagoya University. His research interests include the analysis of driver's route choice behavior, traffic simulation, and transport management using ITS. He was a recipient of the Best Paper Award from the 11th EAST International Conference.
MUTSUMI TASHIRO received the B.E., M.E., and D.E. degrees in civil engineering from Nagoya University, Nagoya, Japan, in 2000, 2002, and 2005, respectively. She is currently a Lecturer with the Institution of Innovation for Future Society, Nagoya University.
TAKAYUKI MORIKAWA received the B.E. and M.E. degrees in transportation engineering from Kyoto University, Japan, in 1981 and 1983, respectively, and the M.S. and Ph.D. degrees in civil engineering from the Massachusetts Institute of Technology, USA. He is currently a Professor with the Institute of Innovation for Future Society, Nagoya University. He has written books on modeling travel behavior. He is a well-known figure on Japanese government panels. His research interests include transport planning, urban planning, transport policies, transport demand forecast, consumer behavior, and environmentally sustainable transport.