Urban Intersection Signal Control Based on Time-Space Resource Scheduling

This paper improves the level of urban traffic control by creasing the dimension of control variables. It focuses on roads rather than vehicles. A new space-time resource scheduling model and a bi-level optimization control method for urban intersections are developed in this study. In traditional concept, the properties of lane are fixed. Nowadays, it changes with the development of new technologies, which increase the dimension of the control variables in the control model and expand the control capability. To this end, the space-time resource scheduling model for intersections includes spatial variables (lane genes, phases, and phase sequences) and time variables (green light time of phases). Then, a new bi-level optimization control method is developed, in which there are an upper layer for lane control based on reinforcement learning and a lower layer is a two-layer optimal control method of phase control based on the model predictive control idea. Finally, the proposed method is proved more efficient than traditional methods after comprehensive experiments.


I. INTRODUCTION
The essence of traffic flow change is that the traffic demand matches the space-time resources [1]. The traffic control [2] at the intersection is a way to allocate space-time resources on the basis of ensuring traffic safety. Therefore, the spacetime resource variables which can accurately describe the characteristics of intersections directly affect the flexibility and advancement of traffic control.
Today, the most widely used traffic signal control systems (such as SCOOT [3], [4] and SCATS [5], [6]) use expert experience to design static variables (lanes, phases, and phase sequences) at intersections, and then the model and algorithm are used to implement the control strategy with the green light time as the main variable. However, the dynamic characteristics of the traffic flow often do not match well with the capacity of the road, which makes the traditional traffic control based on the allocation of time resources difficult to effectively allocate road space resources. Unfortunately, The associate editor coordinating the review of this manuscript and approving it for publication was Nabil Benamar .
The insufficient allocation of space resources is one of the important reasons for traffic congestion.
In recent years, with the development of vehicle-road collaboration [7], [8], autonomous driving [9]- [11], artificial intelligence [12], [13] and other technologies, new ideas have been provided for the efficient scheduling of spacetime resources at intersections, especially in terms of enabling precise control of lanes and vehicles. Surprisingly, although the new technology has been widely used in the field of traffic control [14], [15], it is still mainly based on the allocation of time resources in the traditional theoretical framework of traffic control, and no one has tried to increase the dimension and further research on extended control capabilities. If we treat all entrance lanes at intersections as dynamic variables and control them, this can provide two new changes for intersection control: 1)Change the capacity and phase order by adjusting the properties of the lane to better match the fluctuation of the traffic flow, including the mixing of multiple traffic flows, as shown in figure 1.(a) and figure 1(b); 2) By adjusting the attributes of the lanes, the distribution pattern of the traffic flow can be changed, and then the OD matrix of the road network is changed, as shown in figure 1(b) and figure 1(c). It can be inferred that this will greatly expand the control capacity of the intersection.
In this way, the main content of this paper is to use the timespace resource scheduling model and two-layer optimization to achieve the future urban intersection traffic control. The main contributions of this paper include: • We propose a time-space resource scheduling model for future city intersections. The model describes the lane as a control variable, so the phase and phase order in the model has also changed greatly. The increase in the dimension of the control model variable can expand the ability of the control model. The ability of the control model can be extended by increasing the dimension of the control model variables.
• We have carefully considered the three constraints of lane variable expression, phase division, and model to ensure the rationality of the model. At the same time, a two-level optimization method is designed for lane control based on reinforcement learning and phase control based on model predictive control.
• We further prove the superiority of this method. Specifically, it has advantages over traditional methods in terms of the effectiveness and flexibility of intersection control.
The remainder of this paper is organized as follows: Section II presents a literature review of related work. Section III designs a time-space resource scheduling model for future city intersections, where the space variables are lane genes, phases and phase sequences, and the time variables are phase green time. The three constraints of the model are determined in detail. Section IV proposes a two-layer optimization control method. The upper layer is lane control based on reinforcement learning and the lower layer is phase control based on model predictive control ideas. Section V describes the simulation experiments and presents the experimental results. Section VI concludes this paper and discusses directions for future research.

II. RELATED WORK
Urban intersections control is an age-old issue that goes back to the 1950s [16]. This is the most important issue in the field of urban traffic, even in the future of vehicle-road collaboration [17] and the popularity of autonomous driving [18]. Therefore, it has attracted many scholars. Existing methods can be divided into two categories.

1) TRADITIONAL INTERSECTION TRAFFIC CONTROL
Based on the traditional urban road traffic control theory, combining models and algorithms of modern control, intelligent control, artificial intelligence, and other theories have achieved considerable development and application in the field of traffic control. Modern control theory assumes that the mathematical model of the controlled object is known. Traffic control methods based on modern control theory are mostly called Model-Based Traffic Control theory and methods [14], [15], [19]. At the beginning of this century, the advancement of traffic information and the development of detection technology have greatly improved the types and accuracy of traffic detection data. At the same time, the explosive growth of road traffic travel requirements has made traditional traffic control methods stretched. People began to think about Traffic Control theory and method based on datadriven [20], [21]. Because when it is difficult to develop a model for a controlled system, we can use the system input and output data to implement control and decision-making; In recent years, breakthroughs in artificial intelligence theory and methods and the evolution of large-scale cloud computing and edge computing technologies have promoted the development of new types of intelligent control centered on artificial intelligence methods. Some scholars have proposed artificial intelligence-based traffic control theory and method [22]- [25], which is characterized by advancement, prevention, and initiative. - [32]. However, to the best of our knowledge, it is difficult to obtain satisfactory results in terms of the effectiveness and flexibility of current and future intersection control. The main reason is that these methods are limited by traditional traffic control theories, which generally only use phase, phase sequence, period, and green signal ratio as control variables. Therefore, the traditional traffic control theory studies the allocation of road time resources and it is difficult to effectively allocate road space resources. However, road congestion is often caused because the dynamic characteristics of traffic flow often do not match the capacity of the lane. The study of road time resource allocation in current and future intersection control has reached the ceiling. Therefore, the coordinated allocation of space-time resources will be an effective method to solve this dilemma. n j,a (k + 1) = n j,a (k) + q j,a,in (k) − q j,a,out (k)

III. IPROPOSED MODEL
In the formula, n j,a (k) is the number of vehicles in period k on road j, a, q j,a,in (k) is the number of vehicles in period k entering the road j, a, q j,a,out (k) is the number of vehicles in period k leaving the road j, a.

1) THE SPACE VARIABLES
To accurately characterize the dynamic characteristics of lane properties, the paper first proposes the concept of ''lane genes''. Then the turning properties of the lane change into a control variable. As shown in figure 3(a), turning properties of the lane include left turn, straight turn, and right turn, which are described as L, T , and R.
Further, the smallest unit of traffic scheduling at the intersection is composed of the turning attribute of the entrance lane of the intersection and the downstream link. As shown in figure 3 Gene is express . A lane consists of three genes: In the formula, j,a (k) is the set of spatial variables.
j,a,o (k) is control variable, which is a function of the number of lanes, as shown in equation (3).  gene expression.
In the formula, j,a o (k) is the number of identical genes after lane gene expression.

2) COMPREHENSIVE MODEL
From the spatial variables (2) and the store and forward model (1), we can get: In the formula, S j,a is the capacity of the road section, g j,a,o (k) is the green light time of the phase of the road section j, a in the sampling period k, and g j,a,o (k) ≥ g j,a,o,min . g j,a,o (k) is obtained from the following solution space.

3) SOLUTION SPACE
The expression set of lane genome in all directions of the entrance section of the intersection: A set of phase combinations of lanes in all directions of the entrance section of an intersection when expressed in a fixed genome: The set of the fixed phase combination time sequence of the lane in all directions of the entrance section of the intersection when a fixed gene is expressed: From equation (5), (6) and (7), the solution space of lane gene, phase and phase sequence relationship of intersection is obtained, as shown in figure 4.
where, ω X x j,a (k) of equation (4) is to find a feasible solution in the constrained solution space of {gene, phase, phase sequence} within the period k, and the number of phases belonging to road j, a in the phase combination can be obtained, as shown in equation (9).
In the formula, the o of the green time g j,a,o (k) and the spatial variable j,a (k) are related to the number of phases, so they can be represented by ω X x j,a (k). min{ω j,a (k), ω X x j,a (k)} represents that the number of upstream and downstream links is not the same as the number of phases. The reason is , the upstream and downstream links cannot be disconnected, but the phases can not be subordinate to the phase sequence within the sampling period k.

B. CONSTRAINTS ON SOLUTION SPACE
The solution space of the model needs to be given constraints to make it reasonable. Although lane gene expression is generally based on the above requirements, that is, the number of intersections is not increased or decreased, it is also necessary to consider the role of the control of some intersections with special traffic requirements during design process, such as shown in figure 5.

2) CONSTRAINTS ON PHASE DIVISION
Assuming that a traffic flow occupies at least one lane, it can be seen that the phase is composed of the signal status of the traffic flow on one or more lanes. According to the gnomic expressions of all lanes on the road section of the intersection, the set P x (k) = {O x II (k)} II =1,2,...,σ of intersection phase combinations can be obtained. The requirements of phase division are: the number of severely conflicting intersections of one or more traffic flows in the same signal status is zero.

3) CONSTRAINTS OF THE MODEL
According to the dynamic allocation model of space-time resources, for each optimization cycle T , an optimization scheme for lane gene expression can be calculated. When the optimization scheme is inconsistent with the current operational scheme, it is necessary to judge whether to adopt the optimization scheme or maintain the status. Not every time a different lane gene expression needs to be generated. This is because, on the one hand, when the change in traffic demand is small, the benefits brought by lane control may be small; on the other hand, changing lane functions will cost more, which may cause driver confusion and cause hidden safety risks. Therefore, in lane control, as a more stable part (slow variable) of intersection control, the lane function should be distinguished from the more easily changeable parameter (fast variable) such as signal control [33]. Lane control should be used only when the supply and demand relationship at the intersection changes significantly, such as tidal traffic, emergency rescue, emergencies, traffic congestion, intersection deadlock, bus priority, etc., and for general fluctuations in traffic demand can be signaled Control to adjust. Generally, the following factors need to be considered:

a: CONSTRAINTS ON THE FREQUENCY OF SCHEME CHANGES
Lane control is different from the dynamic optimization of signal control. The former is a reallocation of space resources at intersections, and the latter is a reallocation of time resources at intersections. Lane control will bring intuitive feelings to drivers by changing lane attributes. It has a direct impact on its driving behavior. Therefore, lane control, as a more stable part of intersection control, should limit its changing frequency. Here, the highest changing frequency is set to 10mins. It is expressed as follows: In the formula, t is the running time of the current scheme. h 1 (t) is the constraint condition for the frequency of program changes, 1 means satisfied, 0 means not satisfied. sampling cycles are the same. It expresses as following: In the formula, A(T ) is the lane control scheme calculated at the control period T . h 2 (t) is the stability constraint for the change in traffic demand, 1 means satisfied, 0 means not satisfied.

c: CONSTRAINTS ON SPECIAL TRAFFIC DEMAND
When special needs such as emergency rescues and emergencies occur, in order to respond to them quickly and effectively, lane control can be implemented through human intervention or special event index parameters. It expresses as following: In the formula, h 3 (t) is a constraint of special needs, 1 means satisfied, 0 means not satisfied. Z ∈ [0, 1], 1 means special event occurred, 0 means special event did not occur.

IV. OPTIMIZATION METHOD
The dual-layer optimization algorithm is designed to take into account the different control frequency of lane attribute variables and phase, phase sequence, and green light time in the dynamic allocation model of space-time resources at intersections. The upper layer of the algorithm is lane control based on reinforcement learning and the lower layer is phase control based on model predictive control ideas, as shown in figure 6. When the initialization scheme starts executing, the full red phase is inserted at the end of the initialization scheme, then J S is started to judge and execute lane control. If J S ≥ 0, the lane remains unchanged and phase control is performed. If J S < 0 and n consecutive T s have J S < 0, lane control is activated. According to the indicator, an action is selected from the gene expression action set of the tunnel, complete the adjustment within the full red time, and then execute on the phase control.
The capacity factor is one of the constraints, and it is related to the capacity of intersections [34]. The traffic flow balance constraint ensures that each traffic flow (signal group) has sufficient green light time, that is, the actual capacity of each vehicle traffic flow is greater than its average flow. However, a certain signal scheme must be selected so that the conditions can be satisfied even if the flow rate changes. There are several situations in flow rate change: some traffic flow will decrease, some increase and some will remain unchanged. In order to make it practical, the capacity factor is introduced as the basis for signal switching of the intersection. The capacity factor is as follows: In the formula, J S is the capacity factor of the intersection. J M is the passing rate of the intersection, which is the ratio of the number of passing vehicles to the total number of vehicles. J N is the blocking rate of the intersection, that is, the ratio of the remaining queued vehicles to the total required vehicles.
In the formula, P control = n o=1 S j,a · g j,a,o (k) is the capacity of the intersection. n queue = n o=1 n j,a,o (k) n j,a is the total number of remaining queued vehicles at the intersection.
Remark 2: Under the conditions of input flow and control step size, when J S < 0, it means that the capacity of the intersection is insufficient, so the traffic jam rate at the intersection will continue to increase. The control effect of the intersection continues to deteriorate under the current control scheme. If no other measures are applied, the queue will be overflow, that is, no signal scheme can meet the capacity requirements. At this time, the intersection capacity is considered to be very poor. When J S = 0, it means that the traffic capacity at the intersection is equivalent, but the vehicle passing rate and the blocking rate at the intersection remain unchanged, and the queue length at the intersection is stable. At this time, the intersection is considered to had better traffic capacity. When J S > 0, it means that the traffic capacity at the intersection is sufficient, the vehicle passing rate at the intersection continues to increase, the control effect at the intersection continues to improve, and the queue length at the intersection is gradually reduced. At this time, the traffic capacity at the intersection is considered to be better.

A. LANE CONTROL BASED ON REINFORCEMENT LEARNING 1) STATE SPACE
The maximum queuing length N (k) = max{n i (k)|i = 1, 2, . . . , m} of phase at all entrances of the intersection is used as the state matrix when there are n consecutive T s where J S < 0. The maximum queuing length of all entrance phases at the intersection is taken as the state of the intersection when n consecutive T s .

2) ACTION SPACE
Using the gene expression set of lanes at all intersections as an action space, when a state is observed, an action must be selected from the currently alternative action set, that is, a set of gene expressions is selected from the intersection set of genomes. The principle of action selection: the mean square error between the gnomic expression in the previous state and the current selected gnomic expression at the intersection is small, and the direction capacity is increased by increasing the J S to determine the phase.

4) AlGORITHM STEPS
Step1 Initialize Q to an arbitrary value.
Step2 The initial states consists of the queuing state N (k) at the intersection at time k.
Step3 Using the experience of the Q value, an action a is selected according to the strategy in the feasible lane genome action set A corresponding to the state s.
Step4 Perform action a. Observe the reward function r and the new queue state s at the intersection.
Step5 Step6 Assign s to s.
Step7 J S < 0 exists when there are n consecutive T s .
Step8 Repeat steps three and six until Q value converges. Note: r is the reward of the intersection capacity coefficient, α is the learning rate, γ is the discount factor.

B. PHASE CONTROL BASED ON MODEL PREDICTIVE CONTROL IDEAS 1) MODEL PREDICTIVE CONTROL IDEAS
The phase control algorithm is designed using model predictive control ideas [35], as shown in figure 7. The inputs are the traffic and queuing at the intersection when a phase is executed in the period k. In the solution space, n phases matching the current execution phase are selected as candidates for the next execution phase, and m successive phases for each candidate phase are selected as the control chain. Construct the objective function J , and use genetic algorithm to optimize n control chains. The resulting J of the n control chains are sorted, and the first phase of the control chain with the smallest J is used as the next execution phase of the current phase. The obtained interval time, phase and green time are used as outputs.

2) ALGORITHM STEPS
Step1 Execute current phase and green time. When G i lock is executed, the traffic flow and queuing status of each section of the current intersection are output.
Step2 Start the phase control chain prediction, and select the compatible control chain scheme group of the currently executed phase from the set phase control chain scheme groups. Take traffic flow and queuing status in Step1 as input. With J as the objective function and the genetic algorithm as the optimization algorithm, all the schemes in the compatible phase control chain scheme group are executed separately. Rank the J obtained by all control chain schemes and output the first phase, green time and interval time of the first compatible phase control chain scheme. This process uses an asynchronous mufti-threaded calculation and the calculation time is G i lock . Step3 Output the interval time, phase, and green time calculated in Step2 to the main process. It is executed when G i lock of the current phase ends. Remark 3: Execute the current phase and green time. When G i lock is executed, the traffic flow and queuing status of each section of the current intersection are output. The concept of control chain is proposed for the first time, which is different from the cycle in traditional intersection control. The  periodic characteristics, so the control step size of the model cannot be determined. To this end, the decisive step size consisting of decision points and decision time is called G i lock . Generally take G i lock = 5s. It can meet the requirements of the length of calculation and decision-making time, and meet the computing power limit and decision-making effectiveness. As shown in figure 8(b).

V. EXPERIMENTAL VERIFICATION A. SIMULATION CONDITIONS
VISSIM was used to compare the proposed method with timing control to verify its effectiveness. The input data for VISSIM is the filed data of the intersection of Shengli East Street and Siping Road in Weifang City, Shandong Province, China. Each simulation cycle is 36000s and the average value of ten simulations is calculated for the comparison. Evaluation indicators are the total travel time at the intersection and the total queue length. The details are given in table 1 and  table 2.

B. EXPERIMENTAL VERIFICATION
Traffic control of the intersection with given traffic flow is simulated, the undersaturation and supersaturation requirements of the intersection are simulated with different input flows. The control method in this paper compares to the timing control. The timing control is achieved by offline calculation.
Analyze the flow, density and speed at the entrance and exit sections of the intersection, as shown in figure 9(a)-9(f).
In the figure above, road 1, road 3, road 6 and road 8 are intersection entry roads, road 2, road 4, road 5 and road 7 are intersection exit sections. As shown in figure 9(a) and 9(b), at the beginning of the simulation, the flow at the entrance road is large, and the flow at the exit road is small. In response to the change in simulation time and input flow, timing control and control in this paper make the input and output flow at the intersection have similar trends, but observations found Under the control method in this paper, the changes in the flow rate of road 1 and road 8 are larger than those of the timing control. As shown in figure 9(c) and 9(d), at the beginning of the simulation, the density of the entrance road is relatively high, and the density of the exit road is relatively small. As the simulation time and the input flow change, the timing control and the control of this paper make the density of the entrance road of the intersection have a similar trend, but it is observed that Under the control method in this paper, the density changes of road 3 and section 6 are larger than those of timing control. As shown in figure 9(e) and 9(f), the speed of the entrance and exit roads of the intersection is relatively small during the simulation process. The timing control and the control of this paper make the speed trend of the exit section relatively the same. However, it is found that the entrance road 3 is affected by the control method  in this paper the speed of road 6 is higher than the timing control speed. In summary, the changes in the entrance and exit flow, density, and speed of the intersection conform to the law of traffic flow changes, and the traffic flow parameters of the intersection are better when the control of this paper is adopted. Table 3 are compared the total travel time and average queuing length of vehicles at intersections under different control methods (The paper method and [36]). Table 3 shows that the total travel time of the vehicle passing through the intersection when using the control method in this section is lower than the total travel time of the vehicle passing through the intersection when using the reference 38. The average speed on the road is higher. The average queuing length of intersections in the control method in this section is lower than the average queuing length of intersections. It can be obtained that the control in this section has a better control effect than the reference 38. However, the average queue length at the intersection under reference 38 is longer than the average queue length at the intersection under control in this paper when the simulation time is 8100s, as shown in table 3. The reason is that the input flow at the intersection changes greatly, and the traffic flow needs a certain adaptation time when the control method in this paper is adopted.

VI. CONCLUSION
In this paper, we propose a method to build a space-time resource scheduling model and a double-layer optimal control algorithm for future urban intersections, which takes the road as the main control measure inspired by the vehicle road coordination technology. A detailed process of space-time resource modeling is given. The space-time model variables are designed to increase the dimension of model variables and expand the control ability. Furthermore, a two-level optimal control method based on reinforcement learning for lane control and model predictive control of phase control is designed, and a comprehensive experiment is carried out with simulation data. The results show that the method is superior to the traditional traffic control method in both control effect and control flexibility. In addition, the potential of the spacetime resource scheduling model is also reflected in the control of more complex mixed traffic flow at intersections, such as emergency vehicles, bus priority, etc. However, there are still two shortcomings in this paper: 1) the influence of lane change on driver's driving behavior is not considered; 2) the chain connection of lane control and phase control is designed by using the method of artificial experience, that is because the calculation cost of traversing all combinations in solution space is too high. In the future, we will study the above two problems in detail, and try to extend the method to the area of traffic control. LINGYU ZHANG received the B.S. degree in electrical engineering and automation and the M.S. degree in control engineering from the North China University of Technology, in 2014 and 2017, respectively, where she is currently pursuing the Ph.D. degree. Her research interests include intelligent traffic signal control, human-machine cooperative driving, and driving behavior analysis. VOLUME 9, 2021