Review of Decision-Making and Planning Approaches in Automated Driving

The number of research papers on decision-making systems in automated driving has increased significantly over the last few years. Decision-making for automated driving can be performed at different levels: (i) strategic level: generating the optimal route up to the destination; (ii) tactical level: identifying and ranking feasible high-level maneuvers that the vehicle can perform, considering the dynamic objects that are in the surroundings; (iii) operational level: generating a collision-free trajectory (path and speed profile) up to the planning horizon; (iv) stability level: computing the motion control commands for tracking the trajectory. Additionally, supervision can be understood as a combination of one or more decision-making levels. Previous reviews have focused either on one of the levels of decision-making or on a specific environment where the approaches are applied, without any distinction between the contexts in which they are applied (robotics, unmanned vehicles or automated driving). This review studies the state-of-the-art on the decision-making approaches applied specifically to automated driving, during the last lustrum.

found in planning, supervision and control systems. 23 In this work we present the state-of-the-art on the planning 24 approaches for decision-making, focusing on the automated 25 driving application and not in the robotics domain. Thus, the 26 related works during the last lustrum were studied in this 27 article. 28 29 PLANNING APPROACHES 30 Different classifications of decision-making in general and 31 planning approaches in particular have been presented in the 32 The associate editor coordinating the review of this manuscript and approving it for publication was Zheng Chen . automated driving field. In this chapter, first, a classification 33 based on the type of planning task ordered by computation 34 time is presented; and second, some of the most common 35 classifications are presented. 36

37
COMPUTATIONAL TIME 38 Automated driving tasks in which decision-making can be 39 found are planning tasks, control tasks and even supervision 40 tasks. Planning tasks can be divided into route (or mission) 41 planning, maneuver (or behavioral) planning, and motion (or 42 trajectory) planning. 43 A classification of these decision-making tasks is pre-44 sented in Figure 1. There, a pyramid where the different 45 levels are increasingly ordered by the time consumption of 46 the decision-making tasks performed on it, that is, from 47 long-term to short-term. This pyramid is inspired by the 48 classification of decision-making in management [2], which 49 divides decision-making into strategic, tactical, and opera-50 tional levels. In the proposed classification, the strategic level 51 corresponds to route planning, the tactical level corresponds 52 to maneuver planning, and the operational level corresponds 53 as input, performs the perception and planning tasks to gen-85 erate the planned trajectories as output of the neural network. 86 (iii) Mid-to-end: The neural network receives the data from 87 the perception as input and both planning and control tasks 88 are considered in the neural network, where the output are 89 the control commands. (iv) Mid-to-mid: The neural network 90 receives as input the perception data and generates as output 91 the planned trajectories for the control stage. 92 The main contribution of this review is to present the most 93 relevant works on decision-making for automated driving 94 during the last lustrum, dividing the contributions into differ-95 ent planning levels according to the decision-making pyramid 96 presented in Figure 1, which can be summarized in Table 1. 97 The rest of the paper is organized as follows: Section II 98 introduces the review of route planning approaches (strategic 99 level), Section III presents the review of maneuver planning 100 approaches (tactical level), Section IV shows the review 101 of motion planning (operational level), Section V briefly 102 describes the trajectory tracking stage or control (stability 103 level), and finally Section VI summarizes the conclusions of 104 this review and the trends of decision-making in automated 105 driving. 107 Route (or mission) planning corresponds to the strategic level, 108 which is the higher level of the pyramid of decision-making 109 presented in Figure 1. It computes the sequence of waypoints 110 from an origin to a destination point. This process is also 111 called global planning, since all information about the map 112 is known in advance. Meanwhile, in local planning, most of 113 the information about the map and environment is unknown 114 before the vehicle starts moving. Therefore, in global plan-115 ning the route or itinerary is planned until the destination; 116 however, in local planning, the trajectory is computed until 117 a time horizon (five seconds is the most common horizon for 118 motion planning in the state-of-the-art). 119 The route or mission generated by the route planner is at the 120 top of the planning levels since this route is further used by the 121 maneuver planner (to plan the next sequences of maneuvers 122 to perform) and by the motion planner (to plan the geometric 123 path and speed profile to be tracked by the vehicle). That 124 is, route planning is the less reactive stage of the planning 125 architecture: the behavior of the surrounding vehicles in the 126 short-term will have a lower impact on the route than on 127 the trajectory. The latter may change to avoid collisions with 128 other road users. For this reason, the route planning process 129 does not need to be recomputed with a high frequency; a 130 reasonable execution period is around a few seconds.

131
First section of Table 1 summarizes the most relevant route 132 planning publications described below, which are classified 133 in more detail in Table 2. 134 Route planning is one of the main tasks of the Vehicle 135 Route Planning Problem (VRP), which consists of optimiza-136 tion problems found in the transportation, distribution, and 137 logistics industries [51]. VRP is an NP-hard combinatorial 138 optimization problem [52]. The main classification of VRP 139     The exact algorithms aim to obtain an optimal solution for the 154 VRP problem. The scope of these algorithms is small-scale 155 problems, as they would not be efficient in large-scale prob-156 lems, such as planning the route between different continents, 157

211
Integer Linear Programming (ILP) is an optimization 212 algorithm in which some variables are integers. The most 213 common ILP algorithms for route planning are set partition-214 ing and column generation algorithms. Angelelli et al. [9] 215 proposed a linear programming model as a route plan-216 ning approach to minimize both user travel inconvenience 217 and traffic jams. Their approach consists of optimizing the 218 travel-time instead of generating the shortest route in terms 219 of distance. Rahmani et al. [13] studied the accuracy of the 220 predicted travel times and proposed a solution based on a 221 fixed-point formulation of the simultaneous path inference 222 and travel time prediction problem. Lee et al. [14] focused 223 on the evaluation of travel-time reliability and proposed a 224 measurement method based on the Gini coefficient, which is 225 a well-known measure of statistical dispersion.

227
Approximate algorithms (bottom part of Figure 3 can be 228 divided into two categories: fully heuristics-based or hybrid 229 approaches, combining exact algorithms and heuristics.

230
Heuristics are basic approximate algorithms that find in 231 a reasonable computation time a solution that is as good as 232 possible, but not optimal.

233
The same way, heuristics can be divided into classical 234 heuristics and metaheuristics. Classical heuristics can be classified into constructive heuris-237 tic, improvement heuristics and 2-phase heuristics. Con-238 structive heuristics include the following types of heuristics: 239 saving heuristic, route-first cluster-second, cluster-first route-240 second, and insertion heuristics. (i) Saving heuristic: This 241 solves the problem in which the number of vehicles is not 242 fixed. It generates n routes consisting of only one starting 243 vertex and ending vertex. It then computes the saving cost 244 for combining each of the two routes and sorting the values. 245 (ii) Nearest neighbor method: starts from the starting vertex 246 and searches for the nearest unvisited customer (destination 247 vertex) as the next customer (destination). This procedure is 248 repeated unless it exceeds the capacity limit until all cus-249 tomers (destination vertices) are visited. (iii) Insertion heuris-250 tics: This starts from a single node, which is usually called a 251 seed node. This formed the initial route from the depot. Other 252 nodes are inserted individually to evaluate certain parameters 253 to select a node and the place in the route for insertion. Two-phase heuristic algorithms consist of a cluster phase 255 and a route construction phase. They can be considered 256 as subtypes of constructive heuristics. One example of a 257 two-phase heuristic is the Fisher-Jaikumar algorithm. First, 258 clusters are created using a geometric method that partitions 259 the plane into several cones, where the cone number is equal 260 to the vehicle number. Then, in the route construction phase, 261 customers are inserted into routes according to their increas-262 ing insertion cost, and a traveling salesman optimization 263 algorithm is applied to obtain the optimal travel cost. A two-264 phase heuristic based on the Fisher-Jaikumar algorithm was 265 proposed in [63]

320
A* family of methods, that is, Dijkstra derived methods 321 where a cost function guides the search, can be classified as 322 local search metaheuristics. A method based on a variant of 323 the hybrid-state A* search algorithm for global planning was 324 proposed in [64], where the global path permits searching to 325 generate steering actions.

326
A cluster-first route-second 2-phase heuristic-based 327 approach was proposed in [65]. A variant of the 328 Fisher-Jaikumar algorithm was investigated to solve Capac-329 itated Vehicle Routing Problem. During the constructive 330 phase, routes are created attempting to minimize the cost 331 at the same time. On the other hand, during the route opti-332 mization phase, three metaheuristic methods are used: genetic 333 algorithm, ant colony optimization and particle swarm 334 optimization.

335
An approach to solving the shortest path problem using a 336 hybrid metaheuristic was proposed in [10]. The authors com-337 bined the Variable Neighborhood search metaheuristic with 338 genetic algorithms. Unlike standard methods such as Dijkstra, 339 metaheuristics allow computing multi-objective routes that 340 meet additional constraints even in large-scale road networks. 341

342
Apart from the exact and approximate approaches, there 343 exists a hybrid model in the state-of-the-art in which a heuris-344 tic is applied together with an exact algorithm.

345
Apart from the proposed architecture of route planning 346 algorithms for solving the Vehicle Routing Problem (VPP), 347 other classifications in the state-of-the-art divide the methods 348 depending on the structure used for modeling the space: either 349 graphs or trees. A common way of diving these approaches is 350 graph search-based or sampling-based [64], [66].

428
Since motion prediction for obstacles also makes part of the 429 trajectory planning tasks, this section is common to both 430 chapters (Chapters III and IV).

431
Obstacles motion prediction consists of determining the 432 future motion of dynamic obstacles in a short-term time 433 horizon, where these obstacles may be pedestrians, bikes, 434 motorbikes, cars, trucks, etc.

435
A common classification of motion prediction approaches 436 was proposed in [70], where the authors classified them based 437 on the kind of hypotheses they made about the modeled 438 entities. Thus, they propose the following three-level classi-439 fication with an increasing degree of abstraction:  2) Maneuver-based motion models: These models are 445 more advanced since they consider that the future 446 motion of a vehicle not only depends on the laws of 447 physics but also on the maneuvers that the obstacles may 448 perform, independent of the interaction with the other 449 surrounding obstacles.

450
3) Interaction-aware motion models: These models are 451 the most advanced since they take into account the 452 interactions among obstacles including the ego-vehicle. 453 In the last few-years some reviews of motion prediction 454 on automated vehicles have been published [71], where the 455 authors present the trends in objects motion prediction and 456 discuss the challenges and non-fulfilled gaps in the automated 457 driving domain. In addition, research works such as [27] 458 covered in his thesis work the state-of-the-art on motion 459 prediction approaches.

460
Although the main scope of this paper is focused on 461 decision-making and not on motion prediction, a few appli-462 cations of these three motion prediction models can be found 463 below.   [32] and [75] that 500 propose an interaction-aware approach for predicting the 501 decisions of multiple humans that interact with each other 502 during navigation. For this purpose, the authors use the 503 game-theory approach of Nash equilibrium to anticipate col-504 lisions with humans and propose several avoidance maneu-505 vers. The behavior of pedestrians when negotiating the road 506 crossings with motorized vehicles was studied in [76]. The 507 authors presented the state-of-the-art in vehicle-pedestrian 508 interaction and they provide an interaction process where 509 this interaction can be divided into five different phases: 510 monitoring of potential conflict zone, indication of pedestrian 511 crossing intention, assessment of the environment, commu-512 nication methods among them and decision of maneuver 513 strategies for both vehicle and pedestrian. A motion predic-514 tion approach using a Long Short-Term Memory (LSTM)-515 based Recurrent Neural Network (RNN) for multi-lane turn 516 intersection scenarios was proposed in [23]. The authors 517 focused on improving the decision-making at intersections to 518 achieve human-like accelerations with this learning approach, 519 where the RNN is trained with data of surrounding objects 520 and with the trajectories generated by an MPC-based motion 521 planner for the ego-vehicle, reflecting the interactions among 522 ego and objects. Apart from the previous methods, there is 523 a branch of the interaction-aware motion prediction model 524 called model-based motion prediction. This model assumes 525 that drivers behave in a risk-averse manner, selecting the 526 maneuvers that keep the vehicle away from collision-risk sce-527 narios [27]. This model-based behavior is formulated using 528    safe overtaking trajectories by combining a rule-based 566 maneuver planner using Finite State Machines and 567 reachable sets. A predictive maneuver-planning method 568 for navigation in public highway traffic was proposed 569 in [25]. The proposed method integrates high-level 570 discrete maneuver decisions, that is, lane and refer-571 ence speed selection automata (state machine), using 572 an MPC-based motion planning scheme. State machines 573 were also used for maneuver planning in the 2016 Grand 574 Cooperative Driving Challenge [26]. This state machine 575 implemented the interaction protocols for the different 576 scenarios (merging on highways, intersection crossing, 577 and giving free passage to an emergency vehicle on 578 highways). Recently, a maneuver planner based on finite 579 state machines was used in [24] to seek safe over-580 taking maneuvers with aborting capabilities. A finite 581 state machine based on heuristic rules is used to select 582 an appropriate maneuver (lane keeping, overtaking or 583 aborting), and a combination of reachable sets is used 584 to generate intermediate reference targets based on the 585 current maneuver. Utility-based approaches use heuristics to evaluate different 588 candidate maneuvers with respect to specific objectives, that 589 is, driving goals. These approaches use utility functions (or 590 cost functions) to measure the level of achievement of each 591 alternative maneuver.

592
Examples of utility-based approaches include 593 optimization-based solutions such as those in [15]. The 594 authors presented a time-optimal maneuver planning system 595 for automatic parallel parking using a simultaneous dynamic 596 optimization approach. A dynamic optimization method is 597 proposed using the interior-point method which includes 598 vehicle kinematics, physical restrictions, collision-avoidance 599 constraints, and an optimization objective. In addition, online 600 maneuver planning is performed via receding-horizon opti-601 mization.

602
A hybrid approach was presented in [16], in which a 603 maneuver-based maneuver planner acts fused with a motion 604 planner. After the first trajectory set is computed, the maneu-605 ver planner extracts tactical patterns depending on the spa-606 tial area where the trajectory terminates, how it gets there 607 around the obstacles, and the overtaking order (if any) it 608 follows.  Learning-based approaches are based on a Neural Network 675 trained for a specific purpose. An interaction-aware end-676 to-end deep reinforcement learning approach was proposed 677 in [31]. This work focused on enhancing traffic flow and 678 safety by inducing altruism in the decision-making process, 679 focusing on merging scenarios such as the incorporation into 680 highways. The automated vehicle learns if performing a lane 681 change is more convenient for allowing the other vehicles to 682 merge in the lane. Some specific reviews on the state of the 683 art covering decision-making strategies including maneuver-684 planing approaches have been presented recently in [79] 685 and [80].  [81]. The authors 697 in [82] discussed the challenges of cooperative driving and 698 proposed a system called COMPACT to deal with maneu-699 ver planning. They focused on the overtaking scenario on 700 secondary roads with traffic in front and compared their 701 approach with elastic bands and tree search based algorithms, 702 stating that their approach maximizes distances between 703 objects as the two other vehicles yield, drive to their right 704 road boundary and decelerate. A two-dimensional maneu-705 ver planner in a distributed predictive control framework 706 was proposed in [83] to reduce energy consumption through 707 traffic motion harmonization, thereby improving traffic flow 708 and travel time. The approach includes explicit coordination 709 constraints between the connected vehicles driving in mixed 710 traffic on multi-lane roads.

712
Motion planning corresponds to the operational level of 713 decision-making as presented in Figure 1. It is responsible 714 for defining the sequence of vehicle configurations (position 715 and orientation in time) that allow the vehicle to move from 716 the current position up to the planning horizon, considering 717 both vehicle and environment constraints. In the state-of-the-718 art, motion planning can be referred to as trajectory planning 719 equivalently.

720
Motion planning consists of two tasks: path plan-721 ning, searching the path in the vehicle's configuration 722 space; and speed planning, generating a speed profile, that 723 is, defining a speed (plan in time) per space configura-724 tion. These tasks can be performed either sequentially or 725 simultaneously, as explained in the following subsections. 726  Table 4. According to the architectures for decision-making presented 746 in Figure 2, planning approaches can be divided into three 747 different types depending on the architecture: sequential, 748 behavior-aware or end-to-end planning.

749
Motion planning in both sequential and parallel hier-750 archical approaches can be mostly found in the modules 751 highlighted in green in Figure 6.  even the speed profile. According to [64], these methods 802 can also be called functional methods and can be divided 803 into closed-form functional methods (methods whose 804 coordinates have a closed-form expression) and para-805 metric functional methods (methods whose curvature 806 is defined as a parametric curve, which is a function 807 of their arc length). The most common closed-form 808 methods are polynomials, Bézier curves, splines and 809 nurbs; and the most common parametric methods are 810 Dubins path, clothoids, cubic spirals and quintic G 2 811 splines.

812
(ii) Graph-search based: These methods aim to find the 813 optimal route on a graph and are mostly used for route 814 planning (as seen in Chapter II). However, some of these 815 methods can also be applied for local planning (such as 816 A*) in static environments such as parking lots.  A trajectory generation approach for urban environments 843 based on interpolation of consecutive quintic Bézier curves 844 was proposed in [33]. The authors used quintic Bézier curves 845 since they ensure G 2 geometric continuity (the curves share 846 the same tangent direction and curvature at the joint point) 847 to provide comfort for motion. For this purpose, the authors 848 used the Douglas-Peucker algorithm to compute the reference 849 points for generating the set of quintic Bézier curves that will 850 be interpolated to generate the path inside a corridor. They 851 then evaluated the candidate paths and checked if there was 852 a risk of collision with either static or dynamic obstacles. 853 In case of collision risk with a static obstacle, a set of collision 854 avoidance curves was generated by changing the position of 855 the point perpendicular to the obstacle in the lane. In the 856 case of collision risk with a dynamic obstacle, they analyzed 857 An evidential occupancy grid was used in [35] to model 914 the environment and represent the uncertainty produced by 915 surrounding obstacles. It serves to determine the path candi-916 dates (clothoid tentacles) that are navigable. Chebly also pro-917 posed a motion planning approach using the tentacles method 918 with a clothoid form in [48]. The author combined naviga-919 tion through clothoid tentacles selection with a high-level 920 maneuver planner for the obstacle avoidance application. 921 Yu et al. [39] proposed a layered motion planning framework 922 that handles geometry, nonholonomic and dynamic con-923 straints with distinct methods. After a global path modifica-924 tion layer is used to solve the geometric constraints, a multiple 925 phase sampling layer is performed generating an occupancy 926 grid map. The authors combined this occupancy-grid based 927 discretization with an optimization based path generation to 928 consider the nonholonomic constraints. Finally, they solved 929 the speed planning over the path to solve the dynamic con-930 straints as a convex optimization problem. Gu et al. proposed 931 a sampling-based motion planner fused with a tactical maneu-932 ver discovery reasoning in [16]. Distinct tactical maneuver 933 patterns are extracted from the set of feasible trajectories 934 computed via path generation primitives such as splines (both 935 for path and speed profile). A cost function is then used to 936 choose the final trajectory into the more appropriate tactical 937 pattern set.

938
Risk assessment is an important element in the evalua-939 tion of candidate paths using sampling-based approaches. 940 Pierson et al. [90] applied risk level sets to measure driving 941 congestion, learning the common risk thresholds from the 942 NGSIM and highD driving datasets to classify risk situations 943 into low, medium and high risk. Qin et al.
[91] focused on 944 the risk analysis. The authors formulated a safety assessment 945 of the actions of a level 3 automated vehicle with respect to 946 its environment as constrained optimization problems, solved 947 using Dynamic Programming algorithms. For that purpose, 948 they divided risk into longitudinal risk and lateral risk, regard-949 ing the collision risk with the intermediate front object and 950 the risk of crossing the lane boundaries, respectively. A safety 951 verification system for merge and crossing scenarios was pre-952 sented in [92]. The authors present a Responsibility-Sensitive 953 Safety (RSS) system and integrate the defined safety con-954 straints into motion planning with reachable sets. The next group of approaches is learning-based approaches.  The stability level corresponds to the last level of the 1046 decision-making pyramid presented in Figure 1, where con-1047 trol strategies are applied to select and track a reference 1048 input. In automated driving, this input can be a path, speed 1049 profile, trajectories (paths with speed profile), objects (e.g. 1050 vehicles) or lanes. For each input, the control system selects 1051 the reference to be tracked, and a control law is then applied 1052 to stabilize the vehicle around the selected reference. Thus, 1053 control systems for decision-making are more reactive than 1054 the previous levels in the pyramid, operating in a few tens 1055 of milliseconds to command the vehicle actuators. This com-1056 mand is often calculated in two control steps: high-level 1057 control, which computes the motion commands to follow 1058 the reference input; and low-level control, which computes 1059 the actuator commands from the motion commands. This 1060 separation allows high-level control to be independent of 1061 the actuators and accounts for the reusability. Additionally, 1062 in the automated driving domain there are two main types 1063 of control: decoupled control, where longitudinal and lateral 1064 references are tracked by two independent controllers; and 1065 coupled, where there is one single control law that tracks both 1066 longitudinal and lateral references.
trajectories in overtaking scenarios with capabilities for 1091 aborting the maneuver to merge back in the lane.

1092
In addition to MPC, optimal control methods such as 1093 the Linear Quadratic Regulator (LQR) controller can be 1094 used for simultaneous planning and tracking. An Adap-1095 tive Constrained Iterative LQR based motion planning was 1096 used in [103] in obstacle avoidance scenarios, considering a 1097 two-stage uncertainty aware prediction.

1099
In this state-of-the-art review, more than 100 scientific arti-1100 cles written in the last lustrum were studied. These studies 1101 VOLUME 10, 2022 have shown the capacity of artificial intelligence-based algorithms to solve decision-making problems applied to auto- no deep-learning method applied to decision mak-1158 ing and planning has been integrated into production 1159 systems.
1160 Automated vehicles will continue to affect passenger road 1161 transport in the short term. Their impact on urban develop-1162 ment and relevant challenges were studied in [105]. Among 1163 these challenges we can highlight the following aspects: 1164 (i) Accessibility: Automated vehicles will have to adapt to 1165 operate as either private, shared or public means of transport. 1166 (ii) Traffic: AVs have the opportunity to free public space and 1167 serve areas of limited roadway capacity. (iii) Infrastructure: 1168 AVs will ease the development of new urban infrastructure, 1169 integrate the AV network into energy and telecommunication 1170 networks, developing smart cities.

1171
In terms of communications, V2X systems are still under 1172 development and they have the potential to improve the 1173 decision-making process [106]. For instance, communicating 1174 the position, orientation, speed, route or maneuver intention 1175 of vehicles among them would provide precise information 1176 to complete the current prediction systems.

1177
The Dimensions.ai website [107] was used to quantify the 1178 number of research publications from 2000 to 2021 for the 1179 three planning levels (route, maneuver, and motion) as well 1180 as in decision-making in general term, with special emphasis 1181 on the last lustrum, highlighted in gray. The search queries 1182 used for generating the Figures 7-10 are regular expressions 1183 that include all the previous terms for decision making, and 1184 for each specific topic they include the terms related to the 1185 methods indicated in each Figure. Additionally, we ensure 1186 that in the search there is either the term automated driving 1187 or autonomous vehicle or any of their combinations, to ensure 1188 the coverage of only AV applications. Figure 7 shows the evolution of decision-making in auto-1190 mated driving. This figure shows the number of publications 1191 per year containing in the title or abstract the decision-making 1192 general term (depicted in yellow) and the specific terms (and 1193 their equivalences) for each level of decision-making, i.e. 1194 route planning (in blue), maneuver planning (in orange) and 1195 motion planning (in green).

1196
As can be inferred from the figure, research on 1197 decision-making for automated driving has shown a growing 1198 trend during the last lustrum, from less than 100 publications 1199 in 2016 to over 500 publications in 2021. Although energy-1200 efficient route planning approaches have been studied in 1201 recent years, research on route planning has had almost no 1202 growth in terms of motion planning and decision-making 1203 in general. It should also be noted that maneuver planning 1204 publications by themselves are not so numerous because we 1205 usually refer to them as decision-making systems in the state-1206 of-the-art.

1207
In terms of Route Planning, Figure 8 shows that exact 1208 algorithms remain the most commonly used, where Dijkstra's 1209 algorithm is still the most common choice for route plan-1210 ning. In addition, the impact of metaheuristic algorithms has 1211 significantly increased in the last five years, from less than 1212 200 publications in 2016 to over 800 in 2021.