Connected and Autonomous Vehicle Cohort Speed Control Optimization via Neuroevolution

Predictive Energy Management (PrEM) research is at the forefront of modern transportation’s energy consumption reduction efforts. The development of PrEM optimization algorithms has been tailored to selfish vehicle operation and implemented in the form of vehicle dynamics and/or adaptive powertrain control functions. With the progress in vehicle automation, this paper focuses on extending PrEM into the realm of a System of Systems (SoS). The proposed approach uses the shared information among Connected and Automated Vehicles (CAV) and the infrastructure to synthesize a reduced energy speed trajectory at the cohort level within urban environments. Neuroevolution is employed to incorporate a generalized optimum controller, robust to the emergent behaviors typical of multi-agents SoS. The authors demonstrated the use of heuristics and systems engineering processes in abstracting and integrating the resulting neural network within the control architecture, which enables novel added-value features such as green wave pass/fail classification and e-Horizon velocity prediction. The resulting controller is faster than real-time and was validated with a multi-agent simulation environment and on a real-world closed-loop track at the American Center for Mobility (ACM). The GM Bolt and Volt CAV mixed cohort testing at ACM demonstrated energy reductions from 7% to 22% depending on scenarios.

The authors demonstrate that Neuroevolution can directly  84 The proposed CAV SoS combines SAE Level 3 vehi-85 cles operation with infrastructure connectivity. Safe 86 vehicle-to-vehicle distance is maintained via Adaptive Cruise 87 Control (ACC). In doing so, the AV stack enables safe auton-88 omy at the vehicle level. The CAVs' lead vehicle receives 89 information from the connected traffic lights along the route 90 via a cellular network (Fig. 1). The goal is to control the 91 cohort speed as a single entity and reduce its global energy 92 consumption while enabling any local PrEM powertrain func-93 tion to adapt its energy management strategy locally by 94 receiving a predicted speed e-Horizon. At the cohort level, 95 speed optimization aims to reduce the number of acceleration 96 and deceleration events as the predominant road load term 97 for city driving. These events can be minimized by achieving 98 a ''green wave'' through the traffic light network. The AI 99 learning objective function (LOF) is built on the ability of 100 the entire cohort to pass within the green light window (the 101 reward) while minimizing its dynamic energy demand as 102 follows:

II. PROBLEM DEFINITION AND SYNTHESIS
when V c and A c are the cohort speed and acceleration, respec-105 tively. The reward is a fixed value if successful or zero oth-106 erwise. It forces the controller to pass the green light and 107 built up speed while the energy term forces the system to 108 minimize inefficient speed fluctuations. Note that the authors 109 prove that this equation directly correlates to fuel efficiency 110 improvement in the validation section (see Fig. 7).

111
The SoS architecture is designed around each autonomous 112 agent's ability to safely follow each other, which conse-113 quently enables the problem to be abstracted around a simpler 114 set of learning parameters, shown in Table 1. The number of 115 vehicles, inter-vehicle gaps, and sizes can be abstracted to a 116 single dynamic cohort length L. The lead vehicle distance 117 d 1 to the traffic light is used as the Cohort distance D to 118 the light. The lead vehicle target speed V t is now orches-119 trating the entire cohort operation, resulting in an achieved 120 cohort speed V c . The controller shall learn from the internal 121 dynamic behavior of L and V c to compute a new speed tar-122 get V t . The learning process requires the use of a significant 123 amount of dynamic scenarios representative of real-world 124 conditions bounded by the global achievable comfortable 125 acceleration A min which depends on the cohort's power-126 train and vehicle classes content. While considering both 127 light and heavy-duty CAVs, the following heuristics enable 128 simplification.

142
As discussed in the introduction, the authors seek to avoid the 143 over-simplification of the complex system behavior required 144 to implement classical optimal control algorithms. Neuroevo-145 lution was shown to be capable of direct learning for a wide 146 variety of applications [10], [11] including multi-objective 147 optimization problems [12]. We also seek to achieve faster The validation simulator uses AVL's Multi-Agent simula-187 tor, which was developed to represent real-world driving 188 conditions accurately. We minimized training time by implementing a two-steps 200 neuroevolution process. In a first step, donor neural networks' 201 topology were manually selected from a library of prede-202 fined neural nets. This step speeds up the evolution process 203 as topology evolution is still a complex and time-intensive 204 task [11]. In the second step, each node's weight, bias, and 205 activation functions were respectively tuned and selected 206 using a Particle Swarm Optimization (PSO) algorithm. This 207 method is preferred to Genetic Algorithms by the author for 208 both convergence speed and solution quality during exper-209 imentation. This permits the learning process to take just 210 10 hours on a 16-cores desktop per neural network candidate. 211 The neural network with the lowest LOF value was selected 212 for validation.

213
A single ''training'' traffic light was added to the traf-214 fic environment to provide the infrastructure information 215 (T g , T r ) to the neural network. A uniform Monte Carlo (MC) 216 simulation was used to vary the environment parameters 217 (Table 2)   with T g and T r . If the assessment simulation result leads to 239 an unfeasible solution, the next green window information 240 (incrementing T g and T r values by the traffic light period) 241 is requested and fed to the neural network input layer. The 242 speed of the neural net was further elicited in generating 243 several speed profiles for different cohort design scenarios. 244 For example, the cohort length L can be varied until a feasible 245 solution is reached if the current conditions were deemed 246 unfeasible. This information can be used to split the cohort. 247 These added-value features form complete Neuroevolution 248 based cohort management (Fig. 3).

249
FIGURE 3. Cohort management with its core neural net and added value feature such as eHorizon generation and pass/fail classification.

250
With a large number of activation functions available, the 251 PSO process is likely to converge to a local optima. A first 252 step was therefore added prior to the basic neuroevolution 253 process described above. The preselection of the activation 254 function for each neural network node is achieved via using a 255 Weight Agnostic neural network (WANN) step as described 256 in [19]. The network weight are kept uniform across the 257 neural network during each activation function allocation 258 iteration driven by a Latin Hypercube design of experiment 259 matrix. This matrix contains an optimal permutation of 4 acti-260 vation function for simplicity (ReLu (poslin), Linear (pure-261 lin), Radial Basis (radbas) and hyperbolic tangent sigmoid 262 (tansig)). Weight were varied between -1 and 1. For each 263 generated neural network, 1,500 learning simulations are run 264 and the LOF is recorded. The LOF statistics are plotted in 265 (Fig. 4). The lowest mean of the LOF (minimization) for 266 each node is used as the selection for its activation function. 267 Re-running the basic neuroevolution process with these pre-268 allocated activation functions generated a more robust and 269 higher performing controller (Fig. 5). Note that this controller 270 was not used in the embedded vehicle controller so as to retain 271 the ReLu's computing efficiency.

273
We present several validation result sets, firstly using the 274 learning simulation environment with a P3 powertrain, sec-275 ondly using AVL simulation with multiple powertrain models 276 and real world results on a close loop track at ACM. 277 VOLUME 10, 2022   that LOF strongly correlates to fuel economy increase, hence 289 validating the assumption that minimizing speed fluctuation 290 is a main driver for energy usage reduction in city conditions 291 (Fig. 7).

294
In this simulation environment, the controller is now sub-295 mitted to realistic vehicle dynamics. An edge cases sce-296 nario is presented here, with a cohort including seven dif-297 ferent vehicle types, including a Class 8 vehicle. Noticeably, 298 the Class 8 dynamics during gear shift caused slower than 299 anticipated acceleration rates (compared to the A min range 300 during learning). This compromised the cohort integrity in 301 allowing all the vehicles to pass during one green window. 302 The pass/fail classifier value became evident in allowing the 303 cohort to split appropriately when the cohort integrity became 304 an issue. With this feature each vehicle consistently achieved 305 a positive energy efficiency improvement (Fig. 8). The Sedan 306 PHEV reached 40% in energy consumption reduction in the 307 97798 VOLUME 10, 2022 FIGURE 8. Heterogeneous cohort performance box plot statistics across 30 scenarios. On the right, the performance improve with the integration of the pass/fail classification feature, enabling cohort to split when necessary and hence hindering a complete cohort to stop at a red light.  best-case scenario, while a minimum of 5% improvement at 308 the cohort level is ensured.  The controller was implemented on Gen II Chevrolet Volt 311 and Bolt up-fitted with a Drive-By-Wire system. The cars 312 are also equipped with a dSpace MicroAutoBox II (MAB II) 313 which functions as an onboard processing unit. The MAB II 314 is used to interface with the Drive-By-Wire system, vehicle 315 CAN channels, and various instruments and can also act as an 316 on-board computer to run specific programs and algorithms 317 defined by the user. The Neuroevolution controller was com-318 piled into C code from Simulink and loaded onto the MAB II 319 (Fig. 9). The controller optimal target speed is sent via CAN 320 to the Drive-By-Wire system, which has its own controller 321 and calibration tables to decide on the required Throttle and 322 Brake Pedal position to achieve the demanded vehicle speed. 323 The system was tested at ACM. A two miles route (Fig. 10), 324 with two randomly timed and phased connected traffic lights, 325 was driven ten times with and without the neuroevolution 326 controller. A 12% energy reduction was achieve with a cohort 327 of 3 PHEV vehicles, with a lower trip time of 8% compared 328 to normal autonomous operation on a 55 MPH speed limit 329 scenario (Fig. 11). More recent testing at ACM from July 330 2022 provided the results shown in Table 3. The vehicle 331 order was varied as well as traffic light phasing and timing. 332 Lower benefit from the following vehicles was associated by 333 the vehicle's ACC imperfect behavior in keeping gap and 334 speed steady behind the lead vehicle. In each case, the lead 335 vehicle achieves 16% to 29% energy reduction. Test data 336 also demonstrated that when signal latency was present, the 337 VOLUME 10, 2022 FIGURE 12. Velocity profiles difference between the Neuroevolved controller and a driver without (top in blue) or with (bottom in purple) knowledge of time to green. In the first case, an energy benefit of 38.5% is recorded. This benefit decreases to 5% in the second case.
neuroevolved controller was able to recover by targeting a 338 higher speed target for example once its input layer was 339 finally updated with new values.

340
Another interesting experiment was preformed where a 341 driver, provided with the time to green, drove a vehicle in 342 hypermiling mode to achieve best fuel economy on the track.

343
While significantly reducing energy usage, the neuroevolved 344 controlled still beat the driver by an additional 5% reduction 345 in energy usage (Fig. 12).

347
Neuroevolution provides an effective mechanism to infer 348 self-adaptive optimal control strategies and hence offers a 349 mechanism to ensure sustained optimality. Its development The authors would like to thank Erin Boyd, Daniel Nardozzi, 368 and Danielle Chou at DOE for their support and guidance, 369 also would like to thank the contributions of Jingtao Ma at 370 Traffic Technology Services for the virtual lights setup and 371 providing access to the TTS communication system, also 372 would like to thank Matthew Hunkler at Navistar for the 373 modeling of the Class eight truck energy consumption model, 374 and also would like to thank Besim Demirovic, Neeraj Rama, 375 and Sara Mohon at BorgWarner for providing light-duty BEV 376 and HEV energy models.