Coordination Between Connected Automated Vehicles and Pedestrians to Improve Traffic Safety and Efficiency at Industrial Sites

,


I. INTRODUCTION
The prospect of Connected Automated Vehicles (CAV) pushes researchers to revise traditional traffic management. Recent literature points out that cooperation between vehicles and their self-organization abilities are promising to provide significant gains in traveling time. The benefits of CAVs are not limited to the future urban environment, as they are also needed for improving the performance of manufacturing systems. First, CAVs are more easily exploitable in controlled industrial sites. Second, they are eagerly anticipated for realizing efficiency gains in handling operations.
The associate editor coordinating the review of this manuscript and approving it for publication was Cunhua Pan .
Third, the current advances toward Industry 4.0 encourage the deployment of CAVs by providing a conducive environment for driving automation and connectivity [1].
Currently, CAV technologies extend the advantage of autonomy and connectivity to a broader zone with a higher interaction overhead. A better navigation has been introduced by Autonomous Intelligent Vehicles (AIV) that are capable of optimizing their paths dynamically and bypassing obstacles [2]. When obstacle bypassing is impossible, local conflicts between mobile agents, such as workers, robots and vehicles, arise to safely share a segment of the road. Naturally, it leads to lower traffic efficiency and production capacity loss. Therefore, many papers suggest novel approaches for improving the performance of the intersection network of VOLUME 10, 2022 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ CAVs. The CAVs negotiate together to optimize the pass sequences and their trajectories (speed profile) [3]. In this way, they avoid a complete stop and respect their running schedules [4]. However, when it comes to an intersection between humans and CAVs, the crossing efficiency is still a challenge. Many research works focus on pedestrian detection [5] and cooperative collision warning systems, where CAVs help one another to detect hidden pedestrian obstacles (e.g., in turning movements). This cooperation enhances the safety level of the pedestrian crossing. Additionally, some works explore the signaling system, where a green/red light is displayed on the front of CAVs, to improve the readability of the CAVs' intention [6], [7]. Nevertheless, the tested speed profile of CAVs is limited to a basic slowing-down behavior for avoiding collision with the obstacle or for coming to a complete stop to yield the way. Although the front signaling is required, it is neither the only piece of information for making a decision nor the primary one as discussed in [8]. For instance, a CAV that accelerates and then abruptly come to stop at the last meters considerably prolongs the pedestrian reaction time [9].
In industrial sites, the smoothness of the traffic of all agents, either CAVs or human, is required to enhance the productivity. The complete stop of each agent to yield the way has a cost in terms of precious production time or of degradation of working conditions (e.g., less break time). The road sharing issue in industrial sites can thus be regarded as an optimization problem. The novelty introduced by the paper is the enhancement of the crossing efficiency by considering that both pedestrian and CAV are playing a cooperative game. Both need to find a compromise for moving as close as possible to their desired speed. As a result, CAV and pedestrian are considered as a team that works together to make the crossing more efficient in industrial sites. To this end, in addition to the displayed signal and CAV's connectivity, the paper introduces a communicative speed profile of CAVs for gaining time.
The cooperative game played with both agents can be formally solved using Pontryagin maximum principle, applied to a multi-agent system [10]. However, because the pedestrian parameters are unknown and random, the model isn't adapted to real crossing conditions, raising safety [11] and convenience issues. To address this matter, in this paper we use Deep Reinforcement Learning (DRL), where the CAV agent (single agent) learns to play with randomly simulated pedestrians, safely and efficiently. More precisely, Deep Deterministic Policy Gradient (DDPG) is used because it allows a stable learning [12] needed for this application. DDPG is a model-free off-policy algorithm for learning continuous actions. In addition to DRL, a Model Predictive Control (MPC) [13] based on Quadratic Programming (QP) is used for comparison. Both control approaches are used to control virtual CAVs that interact with real pedestrians crossing the road, using immersive hamlets (Virtual Reality technology).
This paper introduces an innovative approach to pedestrian-autonomous vehicle interaction in an industrial site. This approach starts from the study of multi-agent control which exhibits a novel behavior of both agents, allowing valuable time savings in the industrial environment. Nevertheless, the optimal control can only be accurately applied to the vehicle. This paper uses DRL to benefit from the performance while keeping the pedestrian crossing safe. Here are the most interesting introduced novelties in the paper: • Speed profile as a vector of communication between CAVs and pedestrians. In addition to the widely currently studied signaling systems in the literature, the speed profile of CAV needs to be considered, especially because it is an important parameter for the pedestrian decision-making as reported by many previous studies. This paper goes beyond the observation, optimizing the speed profile and adapting it to the pedestrian's behavior during the crossing.
• An efficient and an easy two-step approach, with details of the set parameters to gain times through the CAV's speed profile in industrial sites. The two-steps are as follows: -Experiments: Real testers who cross a virtual road using oculus Quest, to obtain their speed profiles according to the CAV behaviors based on the multiagent optimal control. -Controller: Setting the DRL controller of CAVs, and train it with fuzzed experiment data, to consider not only the randomness of the pedestrian behavior but also dangerous situations.
• Saving time, both for the pedestrian (32.76% 1 ) and the vehicle (38.01% 1 ). For the former, the speed profile provides the appropriate safety margin. For the latter, the paper introduces an optimal state that the CAV reaches, when the pedestrian exits. The rest of the paper is structured as follows: Section II states the problem and presents a theoretical model. The discussion of the model shows the limitation of the traditional control approach to cope with the random behavior of the pedestrian. Because of these limitations, the method of DRL is proposed. Section III designs a suitable DRL model that can handle the above limitations. The essential parts consist of the model of pedestrian crossing behaviors, the network structure of the DDPG agent, and the reward function. Section IV presents the experimental results under DRL controller and QP. Traffic safety and efficiency are analyzed and compared. Section V concludes the paper and introduces the future work.

II. PROBLEM STATEMENT AND ANALYSIS
Before introducing the theoretical analysis of the crossing system, it is important to emphasize that the proposed optimization system is only a part of a complex system of interaction with the pedestrian. In addition to the signaling system that displays the color to the pedestrian, we consider that CAVs are equipped with an emergency braking system and an onboard ITS station to communicate together. The former allows the CAV to avoid or mitigate the collision with the pedestrian. The latter is used to communicate the presence of the pedestrian with the follower CAVs. In the following, the theoretical problem for the orthogonal intersection of both agents, i.e., pedestrian and CAV, is formally written. The optimization problem can be easily extended to a stream of CAVs. The optimal solution is discussed, from the safety and efficiency standpoints. This discussion invites us to add new considerations to the initial problem, to keep the advantage of the approach while bearing the randomness of the pedestrian behavior.
A. THEORETICAL PROBLEM STATEMENT AND ANALYSIS Fig. 1 presents the elementary crossing system. As shown in this figure, a CAV is moving at its desired speed v v on a single-lane road, while a pedestrian on the side of the road wants to cross the road. It is assumed that the pedestrian's desired speed is v p . An area called ''Conflict Zone'' (CZ) is specified, as shown in the red rectangle, where the CAV and the pedestrian are forbidden to be at the same time for safety reasons. Consider that both road users would move as close as possible to their desired speed, i.e., v v for the vehicle and v p for the pedestrian. The objective can be written as follows: In equation (1), u v (t) and u p (t) are input accelerations of CAV and pedestrian, respectively, whereas v v (t) and v p (t) are the resulted speed. w p is a weighting factor of the pedestrian and t f is the time horizon of optimization. w p ∈ [1, +∞] acts as a cursor to give more or less an advantage to the pedestrian. This allows considering the difference between the speed capability of each agent. t f is defined to allow both agents to regain their desired speed after they cross CZ. t f represents the time when both agents reach their desired speed. Formally, it is written as t f = max(t v,f , t p,f ), with t k,f is the time needed for an agent k to recover v k . One can note from the definition of the objective function in equation (1) that J (u v , u p ) penalizes the deviation during t f . The objective function (1) indicates that the optimal control inputs u v (t) and u p (t) allow the CAV and the pedestrian to cross the CZ at their desired speed as much as possible. And the objective is to find optimal control under certain constraints to improve traffic efficiency.
To consider the trajectory of both agents, the system model is as follows: ] with x k , v k , u k and τ k are traveled distance, speed, acceleration and time delay of the agent k, respectively. k ∈ {v, p} is an index that designates the road user; k = v for vehicle and k = p for pedestrian. Speed and acceleration capabilities of agent are considered as the system constraints: u k and u k designate the maximum and the minimum acceleration (deceleration), respectively. The maximum speed is v k . Equation (4) prevents both agents from moving back, but allows them to stop. u k , u k and v k are defined not only according to physical limitations but also by taking into account criteria such as the safety and convenience of goods or passengers on the vehicle (g.e. autonomous shuttle) and maintenance costs. The values of u k , u k and v k can be defined either empirically or by using the literature. The other important system constraint is safety. Only one kind of agent is admitted in CZ. From [14], the access to CZ is safe if the following constraint holds: with i is either enter or exit CZ. Equations (1)(2)(3)(4) and (5), with the initial state of the system, give the complete formulation of the optimal control trajectory problem of both agents. Fig. 2 illustrates the results of two numerical examples of the optimal trajectory of both agents. For solving the optimization problem, Pontryagin maximum principle with time delay on the control as presented in [15] and detailed in [16] is used. Both sequences that result from equation (5) [17], are compared. The optimal speed profiles presented in Fig. 2 make both agents act with a kind of ''courtesy''. The first one speeds up the pace a little to let the second cross sooner, whereas the second decelerates earlier, meaning that it yields the way to the other agent. Both agents agree on a consensus time t c that optimizes their common problem. t c is the time when the first leaves CZ and the second gets in it. It defines VOLUME 10, 2022 the compromise they both find, according to their desired speed. The problem can be easily generalized for the crossing of a stream of CAVs. Since the scope of the paper is the CAV's speed profile and to avoid the centralized computation overhead, we consider a decentralized optimization, where each CAV optimizes its own interaction with the pedestrian. If the CAV doesn't yield, it communicates the problem to the follower and so on (see Fig.3).
However, as can be noticed, solving the problem in the perspective of multi-agent interaction requires strong assumptions about the pedestrian behavior. Accurate estimations of v p , v p and τ p are required. In addition, some unforeseeable events, such as the pedestrian fall, entirely question the used multi-agent optimal control approach. The next subsection aims to keep the advantage of the approach while making it safer and more adjustable to the pedestrian behavior.

B. PRACTICAL CONSIDERATIONS
To adapt the initial problem statement to reality, three key design requirements should be carefully considered: • Pedestrian safety: When the CAV prompts the pedestrian to enter CZ by displaying the green, it must be able to come to a complete stop before CZ, as long as the pedestrian has not fully exited.
• Pedestrian convenience: Although both agents are assumed seeking a compromise, the pedestrian must be able to move freely, e.g., slower. In other words, the problem turns to a design of a single-agent control, adapted to the characteristics of the pedestrian: reaction time, speed, etc. The cooperative game is therefore only implicit.
• Traffic efficiency: Despite both previous folds, CAV must carry the burden of the crossing efficiency. Based on the solution given in Fig. 2, in addition to the displayed color, the CAV's speed profile should be expressive enough to invite the pedestrian to cross as early as possible and should be able to recover the lost time as much as possible when the pedestrian exit. To respect both first requirements, the speed of CAV is limited according to its distance to CZ before pedestrians exit, as expressed in equation (7). This means that the actual distance to CZ is no less than the shortest braking distance plus a margin v v multiplying a positive gain τ > τ v .
We draw the reader's attention to the fact that the cost of yielding the way is a subtle concept. Because the CAV maintains a safe distance to the pedestrian when she/he is in CZ (equation (7)), there is an optimal speed and gap that let the CAV be the furthest from CZ after the pedestrian exit. We call it the optimal safe state ( We use 5s to let vehicles at all different initial speeds achieve the desired speed. In this case, the furthest CAV has the highest running efficiency. Fig. 4 shows that for Combining the initial behavior (see Fig. 2) with the practical considerations, the single-agent control problem turns to cooperate with the pedestrian as follows: • If the CAV goes first, then it displays the red, communicates the presence of the pedestrian to the follower CAV and frees the CZ the soonest for the pedestrian.
• If the CAV yields the way, then The relationship between CAV's initial safe state (constraint by equation (7)) and CAV state after 5 s.
it invites the pedestrian to get to the CZ the soonest by providing the sufficient safe margin. it must be at the optimal exit safe state when the pedestrian leaves CZ. To meet these control design requirements, this paper uses two control approaches. The first is based on DRL. Indeed, DRL shows its powerful abilities in the field of autonomous driving, as discussed in [18] and [19]. It has also brought new and promising solutions to traffic control issues [20], [21]. More details about the DRL controller design are given in Section III. The second control approach is based on MPC. MPC is only used to assess the effectiveness of DRL. The MPC controller is based on a rolling horizon Quadratic Programming (QP) algorithm [22]. At each time step, the exit time of the pedestrian is computed according to the position and the average speed of the pedestrian. The control algorithm computes the speed profile of the CAV [23] that allows it to reach the optimal control point with the minimum sum of quadratic gap to the desired speed (equation (1)). When the optimal safe state is infeasible, the control brings the CAV to a complete stop near the obstacle, using u v when the CAV is close to CZ.

III. DRL MODEL
A DRL model includes a DRL environment and a DRL agent. The two parts interact by observations, actions, and reward when training the agent [24]. Focusing on our studied case, the DRL model is presented in Fig. 5 (refer to Section III-B for details on the DDPG agent). In the model, essential parts include a DRL agent, CAV dynamic, pedestrian behaviors, observations, and a reward function (see Fig. 5). For the CAV dynamic, this paper uses a simplified vehicle dynamic based on longitudinal motion as presented in [25].

A. MODEL OF THE PEDESTRIAN CROSSING BEHAVIORS
The model of a pedestrian agent is at the core of the training process. The model is designed as a fuzz testing approach [26] based on the optimal speed profile presented in Section II. We emphasize that the model of pedestrian agents does not aim to match the pedestrian behavior precisely. Instead, it aims to quicken the training of the CAV to make it play the cooperative game safely with the pedestrians.
One of the training objectives is respecting the stated cooperative game. In each scenario, the theoretical model evaluates whether the CAV must yield the way. For training, the DRL agent, if the CAV agent respects the results, the pedestrian agent plays the cooperative game and respects the displayed signal as introduced in Section I. The agent either speeds the pace (green signal) or waits for the CAV (red signal). Otherwise, when the CAV does not play the cooperative game, the pedestrian agent takes a risk and does not respect the signal. In this case, the pedestrian either runs or crosses too slowly according to the initial safety margin at the red signal. At the green signal, the pedestrian waits for the CAV to pass first. More details about the behavior of the pedestrian agent are given in Fig. 6.
The other training objective is the safety. Hence, even if the CAV respects the cooperative game, unforeseeable events may happen. Fuzz testing aims to avoid collision risks and to randomize the pedestrian parameters. More precisely, there is a probability that the pedestrian suddenly completely stops in CZ. This probability is set to 0.03 per 0.5 s to achieve the purpose of 30% probability of stopping at each crossing. As a result, if the CAV does not respect the safe gap, a collision may happen. This probability intends to allow the CAV to react correctly when the pedestrian falls during the crossing. Besides, when the red color is displayed, if the safety margin is acceptable [27], the pedestrian continues the crossing with a probability of 10%. This allows CAV to be able to stop if a dangerous pedestrian behavior is detected.
Other random variables are considered: The desired speed (comfort speed) of the pedestrian, the reaction time, and the safety margin are randomly defined according to data given by [9], [27], and [28] respectively. Because the pedestrian crossing tests are conducted through the VR hamlet, the pedestrian's speed is multiplied by 0.544. This value is the ratio observed empirically during the tests between the pedestrian free walking speed in the immersive environment and the literature data.

B. DEEP REINFORCEMENT LEARNING
At each time step t (t = 0, 1, 2, 3 . . .), the DRL agent observes the state information s t (s t ∈ S) where S is the set of environment states which are pedestrian position and speed, CAVs position and speed in this research, then outputs an action u t (u t ∈ [u, u] ⊆ U ), where U is the scope of CAV acceleration and u t is the longitudinal acceleration for CAV. After a control cycle, the environment gets to the next state s t+1 . According to the defined reward function, a step reward r t+1 for this transition (s t , u t , s t+1 ) is calculated and feedback to the agent. At each state transition, the agent receives a step reward r t , which is then used to update and form a control policy π(u t , s t ) that aims to map the current state to an optimal control output u t for future use. This process continues until a terminal state is reached, then the agent restarts. The objective of the agent is to maximize the accumulated reward values in equation (8) in Section III-C from a long-term perspective.
In this article, DDPG agent is selected to tackle the problem with the demand for continuous observations and action space. DDPG is an off-policy method and entirely modelfree. Detailed information about the DDPG agent and its variables can be found in [29]. The specific structure of the selected agent is shown in Fig. 5. In each fully connected layer (except the sixth layer in the critic network), the number of neurons is 48. The observations are L (see equation (8)), , v p (t), and f p (t) (see equation (8)).

C. REWARD FUNCTION
To optimize the problem described in Section II, we define equation (8) as the reward function to train DDPG agent. In equation (8), the reward r(t) consists of three parts r 1 , r 2 , r 3 based on several considerations.
f k (t) = 0 before agent k enters CZ 1 after agent k enters CZ (8) r 1 represents the safety constraint as formulated by equation (7). When the system is secure ( L ≥ 0), it contributes a positive reward of 10. Otherwise, a negative reward will be given according to the degree of danger. K 1 (> 0) is the weight coefficient. Because of the importance of safety, a relatively big value should be given to K 1 . Besides, r 1 is used in the case that the pedestrian passes CZ first. r 1 = 0 if the pedestrian chooses to wait for the vehicle to pass first. r 2 is used to optimize the objective function equation (1) from the perspective of CAV. It gives a positive reward of 2 when CAV speed is close to the desired speed. To guide the agent to explore the action space efficiently, we set a changing negative reward for the CAV speed deviation from desired speed. K 2 (> 0) is the weight coefficient. r 3 is used to optimize the objective function equation (1) from the perspective of pedestrian. According to the analysis in Section III-A, vehicle behavior will affect the choice of pedestrian behavior. Therefore, pedestrian behavior should be evaluated by a reward function to optimize the whole system. The definition of r 3 is based on the pedestrian behavior model built in Section III-A. For a better explanation, we use a counter function f k (t), with k ∈ {v, p} which represents a vehicle and a pedestrian alternatively. In r 3 , K 3 (< 0) is the weight coefficient. In the beginning, due to the reaction time (t r ), the pedestrian doesn't know whether to enter CZ first or wait. However, regardless of the passing order, it gives a negative reward to reduce the waiting time of pedestrian before entering CZ. Therefore, real-time reward r 3 = constant when f p (t) is 0. When the pedestrian enters CZ (f p (t) = 1), it is obvious that the order of passage is known. If the vehicle has entered (f v (t) = 1), it can be thought that the process of crossing has been over. Because in the case of vehicles passing first, only the waiting time of pedestrians needs to be concerned. If the CAV has not yet entered, a negative reward value is given when the pedestrian normally passes (v p (t) > 0) to punish the deviation between the real speed and the desired speed. This way ensures consistency with equation (1) to a certain extent. Finally, if the pedestrian doesn't pass normally but stops in CZ for any reason, we give a reward of 0 because the punishment is meaningless in this case.
The above rewards r 1 , r 2 , and r 3 represent the features analyzed in Section II. These defined rewards serve as training signals to select appropriate behaviors in the context of the desired task. The selection of such reward function includes careful considerations of the vehicle and pedestrian behaviors. From the training results of many attempts, we have found it difficult to converge the reward by independently considering only the CAV's actions. Generally, there are collisions, or the vehicle always slows down until it stops totally to allow the pedestrian to go first. However, with this reward function, the test results in Section V show that the CAV cooperates with the pedestrian to ensure safety and improve the crossing efficiency. More interestingly, the CAV can choose the appropriate crossing order according to the actual situation rather than always slow down and wait for the pedestrian. The CAV with such intelligent behavior makes decisions like humans do when driving. Therefore, the proposed reward function in this paper is effective for the training.

IV. EXPERIMENTS AND ANALYSIS
According to the above model, in this section, we test: • CAV's ability to show appropriate driving behaviors.
This includes cooperating with pedestrian to choose the crossing order, pedestrian-friendly driving behavior, and dealing with dangerous pedestrian behaviors.
• CAV's ability to reach the ''optimal state'' when pedestrian exits the lane, as analyzed in Section II-B. This reveals the combination of crossing efficiency and safety.
• The generality of our method when the virtual pedestrian simulation is replaced by real testers.
• The advantage of the DRL controller when compared with the method of QP in our studied case.
The coordinate system is established as in Fig. 1. In all the experiments, we set some invariant parameters: l r = 4 m, The DDPG agent is realized through MATLAB, and the DRL environment is built through SIMULINK. The structure of the two networks is shown in Fig. 5. The sample time sets to 0.5 s. The learning rates of critic and actor set to 0.001 and 0.0001. The experience pool sets to 10000 for these two networks. The algorithm used for training the actor and critic function approximator is Adaptive movement estimation (Adam). The loss function adopted for learning is mean squared error (MSE). Moreover, the discount factor applied to future rewards during training and the noise of exploration are set to 0.99 and 0.6, respectively. After many tests, a suitable batch size is chosen as 64. The episode training will be terminated until x p (t) ≥ 4 and x v (t) ≥ 25. After 8000 training episodes, the reward converges gradually, as shown in Fig. 7.
To compare the experimental results with the theoretical optimal solution, the paper uses the method QP to calculate optimal control for CAVs. The optimal state S * sets as the goal to reach at the moment when the pedestrian exits CZ. And the same constraints (equation (3)(4)(7)) are applied to the QP solver. The control cycle sets to 0.5 s. By this way, the solver provides a solution which consists of the optimal control values at each step. We use the first value to control the CAV. When a new cycle comes, the program recalculates the optimal control value. This cycle continues until the pedestrian leaves the CZ.

A. CAV DEALING WITH NORMAL PEDESTRIAN BEHAVIORS
The purpose is to test CAV's ability to show appropriate driving behaviors and to reach the ''optimal state''. The results are shown in Fig. 8 and Fig. 9.
In Fig. 8, the initial state of the CAV is (−20m, 10m/s). The theoretical cooperative game is in favor of the CAV, so the pedestrian has to wait and the CAV should go first. Fig. 8b shows that the pedestrian waits for the CAV, and the CAV shows the red to the pedestrian and passes first. Fig. 8a depicts that the CAV decelerated in the first 0.7 s. This behavior is perfectly reasonable because the pedestrian may enter CZ even with the red. If the pedestrian chooses ''enter,'' such deceleration of the CAV makes the subsequent braking distance enough to ensure safety. If not, like in this case, the CAV accelerates with the maximum acceleration until it exits CZ to shorten the pedestrian's waiting time (3.5 s in this case). The CAV does not exceed its maximum speed v v during the whole process, and it cooperates with the pedestrian to complete the crossing task. This kind of cooperative behavior between both ensures traffic safety and improves passing efficiency. In this case, the method of QP is not compared because it cannot select the passing order automatically according to the state of the pedestrian.
In Fig. 9, the initial state of the CAV is (−50 m, 10 m/s). From the perspective of the CAV, the distance to CZ is enough to let the pedestrian cross first. In Fig. 9a, the CAV slows down until the pedestrian enters the CZ at 2 s. The sharp deceleration in this process can help the pedestrians understand the CAV's intention more quickly, according to the conclusion in [9]. Besides, it delays the time of entering CZ to allow enough time for the pedestrian to pass through. During the pedestrian crossing (2 s to 5.3 s), the CAV continually adjusts its speed to reach the optimal state when the pedestrian releases CZ. Fig. 9b shows the state when the pedestrian exits. CAV by DRL is (−14.3 m, 6.8 m/s) and CAV by QP is (−19 m, 5.3 m/s). From the perspective of the CAV, after observing the large distance (35 m) and the low speed (4.5 m/s), the pedestrian decides to enter CZ first at 2 s, and finally, exits at 5.4 s. QP failed because the theoretical pedestrian speed is smaller than the simulated pedestrian speed. Besides, Fig. 9b shows that the speed curves for both methods are located at the safe area before the pedestrian exits. Hence, both methods give solutions that ensure pedestrian safety. However, DRL shows higher traffic efficiency  compared with QP because DRL is about 8 m ahead of QP when they all reach the v v at 8 s (see Fig. 9ac). Therefore, CAV controlled by DRL shows more efficient driving ability.

B. CAV DEALING WITH DANGEROUS PEDESTRIAN BEHAVIORS: SAFETY TEST
The purpose of this experiment is to test whether there will be a collision if the pedestrian stops in CZ at any time during the crossing. For doing that, the pedestrian stops moving as soon as entering CZ, and stops just before exit, respectively. The results are shown as in Fig. 10 and Fig. 11.
In Fig. 10, the initial state of the CAV is (−50 m, 10 m/s). Fig. 10a shows that the pedestrian stops moving as soon as 68036 VOLUME 10, 2022 he enters CZ and waits until the speed of CAV is 0. When the vehicle stops, the pedestrian moves again at 11 s. The pedestrian exits CZ at 15 s and the CAV enters after the pedestrian exit (see Fig. 10c). Hence, there is no collision between the two agents for the two methods. Fig. 10b shows that the two methods ensure the pedestrian safety. However, the difference is that DRL chooses to stop at −10 m while QP chooses to stop at −27.5 m. Additionally, DRL has a higher traffic efficiency because it is about 5 m ahead of QP when they reach v v at 19 s (see Fig. 10ac). The CAV dynamically interacts with pedestrian movement. When the pedestrian stops at the beginning, the CAV of DRL slows down and stops at −10 m instead of 0 to leave a buffer. When the pedestrian moves again, the CAV ''cleverly'' uses the buffer to accelerate and get a relatively higher speed (3 m/s) instead of 0 when the pedestrian releases CZ. This behavior takes into account both the CAV position and speed to achieve higher traffic efficiency.
In Fig. 11, the initial state of the CAV is (−50 m, 10 m/s). The pedestrian normally crosses at the beginning but stops before exit, as shown in Fig. 11ac. In this case, the strategy chosen by both methods is to stop around 0 to wait for the pedestrian (see Fig. 11b). As shown in Fig. 11b, DRL strongly keeps the safety constraint when the pedestrian is in CZ. The solution of QP has a similar result. Unlike the previous case, the pedestrian may exit CZ at any time. Therefore, the CAV of DRL decides to stop at 0 for being ready to enter CZ once the pedestrian exits. This action can reduce the time gap between the pedestrian exit and the CAV entering.

C. CAV DEALING WITH REAL PEDESTRIAN BEHAVIORS
This experiment aims to test whether the developed algorithm is applicable to the case with real pedestrians. Tests based on the VR hamlet have been performed.
Testers participate in the cooperative game with the VR hamlet that provides immersive 3D scenes to make the experiment close to the real situation and increase the consistency between the experiment and the real case. Fig. 12 shows the test environment. Inside the Guardian (predefined boundary), testers wearing the hamlet interact with the virtual environment. In the virtual scene, the CAV tells the pedestrian the intention through its signal light. The participants were the visitors of the UTBM's open day on February 2 nd , 2020. The recruitment was opportunistic. 28 virtual road crossings have been achieved by 13 females and 15 males testers from 7 years old to 62 years old. 3 testers were younger than 18 years old, and one tester was older than 50 years old. The testers must physically cross the virtual road presented in Fig. 12. We draw the reader's attention that there is no virtual avatar of the pedestrian in the scene. The street is a two-lane road with a second lane being empty to measure the pedestrians' walking speed. When the virtual CAV observes a pedestrian near CZ, it computes the optimal cooperative game solution and behaves accordingly. Fig. 13 compares the free walking speeds with the crossing speeds. Among the 28 virtual crossings, only 14 free walking records are usable because some testers stopped the experience earlier because of some reasons. All computed speeds are represented in the figure. The free walking speeds are presented by markers on the left of the figure, whereas the crossing speeds are displayed on the right. The rectangles near each set of markers are colored according to the speed distribution. Each line linking two markers shows the difference between the free walking speed and the crossing speed of the same tester. One can note that all testers, except two of them, play the cooperative game by speeding up their pace when they cross CZ. Averagely, the crossing speed is 24.73% higher than the free walking speed for the 14 testers.
Another key performance indicator is the speed of the CAV when the tester enters CZ. With a usual car speed profile, it is reported in [30] that 46% of pedestrians wait until the cars stop. In the conducted tests, only three of the 28 testers waited for a complete stop of the CAV before entering CZ. The average speed of the CAV when the tester crosses equals 6.70 m/s. Both the signalization displayed by CAV and its early deceleration −3 m/s 2 contribute to gain time. Fig. 14 draws a crossing result of real testers. The initial state of the CAV is (−50 m, 10 m/s). In Fig. 14a, the CAV shows a green light and slows down to 3.7 m/s at 2.5 s to ''tell'' the pedestrian to cross first, and the tester enters at 3 s and exits at 7.1 s. The speed curve of the CAV in Fig. 14a is similar to Fig. 9a in Experiment 2 which simulates a similar case. Fig. 14b shows that the CAV of DRL reaches the state (−11.7 m, 5.8 m/s) and QP (−12.2 m, 5.5 m/s) when the tester exits. Both of them are very close to the optimal state p2 (in Fig. 4) and meet the safety constraint before the exit of the pedestrian. However, the small difference makes the DRL about 4 m ahead of QP when they all reach the desired speed at 10 s (see Fig. 9ac). Hence, in this experiment, DRL has more advantages than QP in terms of crossing efficiency. Table 1 shows the statistics of the 28 experiments based on the testers' speed profiles. In these experiments, the 28 testers crossed the road naturally without deliberately stopping in the CZ. The average final position indicates that the DRL is about 2 meters ahead of QP on average after reaching the desired speed. Furthermore, lower STD values indicate that DRL is more stable in the pursuit of high crossing efficiency. One of the reasons is that QP fails thirteen times (46%) to provide solutions. So, in that case, the CAV longitudinal control resorts to a classical slowdown behavior near the obstacle.
Besides, to highlight the advantages of the proposed approach in terms of efficiency, a comparison is performed with the classical slowdown behavior. In this comparison, we assume that the 28 testers keep their safety margin for crossing CZ as well as their crossing speed in both scenarios, i.e, DRL and classical way. As presented in Table 1, averagely the pedestrian crossing delays the CAV by 6.21 s ± 3.03 s in the classical scenario, whereas in the DRL scenario, the delay equals 3.85 s ± 2.20 s. The DDPG agent allows the CAV gaining on average, 2.36 s (38.01%). The crossing of CAV delays pedestrian by 5.25 s ± 1.03 s in the classical scenario, whereas in the DRL scenario, the delay equals 3.53 s ± 1.60 s. The DDPG agent allows the pedestrian gaining, on average, 1.72 s (32.76%). The big value of the standard deviation is because there are 6 crossings among the 28 with a low gain. More precisely, there are 4 crossings where the CAV was initially far from CZ and 2 crossings where the pedestrian decision-making was only based on the displayed green without waiting for the CAV to begin the slowdown process.
There are two advantages of DRL. First, the DDPG agent invites the pedestrian to cross CZ earlier, which allows also the pedestrian to reduce the waiting time. Second, it prepares well the exit conditions to make the CAV close to the optimal state when the pedestrian exits. This process is drawn in Fig. 15 where three control methods are compared. Spontaneously, the CAV keeps its desired speed until it approaches CZ, and then it slows down. This behavior delays the time for pedestrians to enter the CZ. On the contrary, in our proposed method, the CAV prepares earlier at a low driving speed and a safety margin that invites the pedestrian to cross the soonest. Besides, the CAV is closer to the optimal state when the pedestrians leave. This attribute helps CAV decrease driving distance loss.
To sum up, firstly, the results of the above experiments show that the CAV with DRL controller can select the appropriate behavior in the face of different situations. Second, the CAV with DRL controller can reach the optimal state, especially when the pedestrian crosses normally. Thirdly, experiments with actual participants show that the model trained in this paper has generality. Finally, according to the results of the above experiments, the DRL controller has an advantage compared with QP in terms of crossing efficiency.

V. CONCLUSION
This work uses automated driving for improving the safety and efficiency of CAV and pedestrian crossings in industrial sites. It explores novel speed profiles of CAVs to allow better coordination with the pedestrians. The proposed approach is based on a common cooperative game played by pedestrians and CAVs. Both of the agents play the cooperative game to be as close as possible to their desired speeds. First, the experiments, involving testers in an immersive environment, show that humans are generally ready to play the cooperative game. The use of DRL aims to handle the randomness of human behaviors during the cooperative game. The simulations show that the coordination between the DRL agent and humans makes the CAV invite the pedestrian to cross the soonest to save time, keep a safe distance to prevent a collision, and plan to gain some meters after the exit of the pedestrian. An obvious gain of time is provided by the proposed approach.
Two approaches are compared: DRL and QP. Based on human tester speed profiles, the DRL shows its ability to be slightly more efficient. Indeed, at each iteration, QP must recompute the CAV acceleration according to the new state of the pedestrian. More precisely, QP allows us to obtain the optimal solution only if it is feasible. Hence, feasibility issues are raised when the pedestrian does not respect exactly the exit time or when the initial state of the CAV does not allow reaching the optimal state. Moreover, the QP adds a computation overhead that questions the suitability of the response time for the studied real-time system, while DRL does not suffer from these issues.
The advantage of the DRL is that it can be trained with several randomly drawn pedestrian profiles coupled with unexpected behaviors. The DDPG, despite the relatively long training time due to the different assumed pedestrian profiles, allows a stable and predictable behavior. If the DDPG agent is well-tuned and trained, our tests show that DRL can respond to the pedestrian agent's various states, such as stopping or disobeying traffic signals. For instance, even when the pedestrian comes to a complete stop in the middle of the road (falls, loss of a valuable object, etc.) the behavior of DRL remains safe and efficient.
This paper highlights two reasons that advocate the investigation of CAV speed profile when it comes to pedestrian crossing in industrial sites. First, the speed profile of CAVs determines the value of key variables that pedestrians use to make their decision. Second, crossing safety and performance are the results of the chosen speed profiles of both agents (CAVs and pedestrians). The speed profile exhibits new opportunities that only autonomous driving can achieve. The presented results show that the CAV speed profiles deserve to be thoroughly considered in the field of CAV/pedestrian coordination in industrial sites. They encourage continuing research on the CAV trajectory control to improve their coordination with the pedestrians. This opens the research field of coordination analysis with different crossing conditions (vehicle categories, number of lanes to cross. . . ) coupled with incentive signage in industrial sites.
Although the work focuses on the industrial environment where time saving is needed and pedestrians are trained, the encouraging results invite to extend the approach to other environments. To this end, more studies must be conducted with cautious, especially in the case when the vehicle does not yield the way. This case happens in practice, at least because it is physically impossible for the CAV to come to a complete stop before CZ. Even if the trained DRL decelerates a little before freeing CZ the soonest (see Fig. 8), more research is needed to make the priority denial readable in an extended environment, not only by the speed profile, but also through an effective signalization. Another extension that is simpler, is the adaptive traffic light based on the pedestrians pressing the button, the vehicle will adapt its speed to save time when it is allowed to pass according to the pedestrian phase.
ABDELJALIL ABBAS-TURKI received the Ph.D. degree in control and computer science from the Belfort-Montbéliard University of Technology (UTBM), France, in 2003. He is currently an Associate Professor with CIAD, UBFC, UTBM. He is involved in many bus rapid transit projects and traffic flows design in industrial sites. He is also involved in developing cooperative intersection management of autonomous vehicles for encouraging new mobility systems. He was the Head of the X.icars demonstration at ITS WC 2015. His research interests include discrete event dynamic systems, combinatorial optimization, control theory, and artificial intelligence applied to traffic modeling and control.
YAZAN MUALLA received the Ph.D. degree in computer science from the University of Bourgogne Franche-Comté (UBFC), France. From 2010 to 2014, he was a Field Engineer at the Drilling and Measurements Segment, Schlumberger International Company. He is an Associate Professor with the University of Technology of Belfort-Montébliard (UTBM), France. His research interests include multi-agent systems, explainable AI, human-computer interaction, and intelligent transport systems.
ABDERRAFIAA KOUKAM received the Ph.D. degree in computer science from the University of Nancy I, France, in 1990. From 1986 to 1989, he was an Assistant Professor at the University of Nancy I, France, and a Researcher at the Centre de Recherche en Informatique de Nancy (CRIN), from 1985 to 1990. In 1999, he received the enabling to direct the research in computer science from the University of Bourgogne, France. Presently, he is a Professor of computer science with Université de Technologie de Belfort-Montbeliard (UTBM). He is the Director of the Systems and Transportation Laboratory. His research interests include modeling and analysis of complex systems, including software and knowledge engineering, multi-agent systems and optimization, and he has been involved in several international and industrial projects.
XIAOWEI TU received the bachelor's degree in computer science from Shanghai Jiao Tong University, and the master's and Ph.D. degrees in computer vision and image processing from the Compiègne University of Technology, France, in 1978 and 1987, respectively. Now he is a Professor with the Mechatronics and Automation Institute, Shanghai University. His research interests include the sensors and control system for autonomous vehicles, real-time vision system design, visual servoing, augmented and virtual reality systems, and integration of automation systems for industrial applications. VOLUME 10, 2022