Incremental End-to-End Learning for Lateral Control in Autonomous Driving*

The use of high-quality data is required to complete the job of lateral control utilizing Behavioral Cloning (BC) through an End-to-End (E2E) learning system. The majority of E2E learning systems gather this high-quality data all at once before beginning the training phase (i.e., the training process does not start until the end of the data collection process). The demand for high-quality data necessitates a large amount of human effort and substantial time and money spent waiting for data collection to be completed. As a result, it is critical to find a viable option to reduce both the time and cost of data collecting while also maintaining the performance of a trained vehicle controller. This paper offers a novel behavioral cloning approach for lateral vehicle control to address the aforementioned problems. The proposed technique begins by collecting the least amount of human driving data possible. The data from human drivers are utilized for training a convolutional neural network for lateral control. The trained neural network is subsequently deployed to the vehicle’s automated driving controller, replacing a human driver. At this point, a human driver is out of the loop, and an automated driving controller, trained by the initial data from a human driver, drives the vehicle to collect further training data. The driving data obtained are sent into a convolutional neural network training module, then the newly trained neural network is deployed to the automated driving controller that will drive the vehicle further. The data collection alternates neural network training processes using the collected data until the neural network learns to correctly associate an image input with a steering angle. The proposed incremental approach was extensively tested in simulated environments, and the results are promising. The incrementally trained neural networks using data collected by automated controllers were able to drive the vehicle in two different tracks successfully.


I. INTRODUCTION
Reinforcement Learning (RL) and Imitation Learning (IL) account for a large share of human learning. IL is also known as Apprenticeship Learning (AL) or Learning from Demonstration (LfD). Both RL and IL rely on experience. In RL, one learns to do a task based on their own experiences, which may be challenging and time-consuming. IL, on the other hand, mimics an expert's behavior in order to learn from them [1] [2]. For example, it is unreasonable to expect a person who is given a car to learn to drive on their own without an expert's help (RL). It would be considerably easier to show the person how to drive and have one learn from the demonstration (IL).
Similar to human learning, RL and IL are part of Machine Learning (ML) too. In RL, an agent interacts with an environment by following a policy. The agent learns by mapping states to actions. In each state of the environment, it behaves in line with the policy, receiving a reward and transitioning to a new state as a result. The goal of RL is to develop the best policy (using trial-and-error) for maximizing long-term rewards [3]. The main limitation of RL is that an optimal reward function cannot be easily found in a complex situation.
IL, on the other hand, is an alternative approach when VOLUME 4, 2016 E2E autonomous lateral control (Fig. 2) relies on highquality driving data for deep neural network designs [6]. The majority of E2E learning systems gather this highquality data all at once before beginning the training phase (i.e., the training process does not start until the end of the data collection process). The demand for high-quality data necessitates a large amount of human effort, and the time and money spent waiting for data collection to be completed are required [7].
As a result, it is critical to find a viable option to reduce both the time and cost of data collecting while also maintaining the performance of vehicle controllers.
Our unique technique offers a practical solution in which a human drives only as much as is required, and an Artificial Intelligence (AI) chauffeur powered by a deep convolutional neural network drives on behalf of the human driver when additional driving data is required. This paper's contributions are the following: (1) This article proposes a unique BC process in which data collecting and neural network training are done incrementally with little initial data from a human driver. Driving data will be acquired without the need for additional human driver effort by alternating driving by a neural agent for data collection and neural network training. As a result, the high-quality training data can be collected by an AI chauffeur with bareminimum human driving data. (2) The proposed method is built and evaluated using the open-source platform OSCAR (Open-Source Robotic Car Architecture for Research and Education), which was developed by the author. OSCAR is an open-source and full-stack robotic automobile architecture that is meant to improve robotic research and education in the context of self-driving cars. The findings of the experiments, as well as data and other material, are available online 1 .
The remainder of the paper is organized as follows: Section II introduces a definition of behavior cloning based on E2E approaches and discusses data collection methodologies. Section III provides the formulation of the proposed method. Section IV presents results to demonstrate the validity of the proposed method. Finally, in Section V, the conclusions of the paper and future work are presented.

II. RELATED WORK
This section will explain the work connected to E2E learning and followed by the work related to data collecting.

A. END-TO-END LEARNING
BC through E2E learning was originally proposed and accomplished by Autonomous Land Vehicle in a Neural Network (ALVINN) [8], which demonstrated that an E2E method utilizing a fully connected neural network was capable of steering a car on a road. They employed photos from a camera and a laser range finder as input. Their neural network was shallow and at that time using a Convolutional Neural Network (CNN) was not an option.
DARPA Autonomous Vehicle (DAVE), an off-road radio control (RC) automobile, was built in a similar manner [9]. Their network learned through training examples how to extract and process important information from raw (unprocessed) video input data, such as 3-D image extraction, edge detection, object detection, and obstacle avoidance steering strategy. Instead of using a simulator, they used an RC truck.
Nonetheless, behavioral cloning adopting an E2E technique did not acquire popularity until DAVE-2, which was influenced by ALVINN, was developed [5]. The original DAVE extracted essential visual information from photos captured by a front camera using a CNN. This was 10 years before CNNs demonstrated groundbreaking pattern recognition ability in the IMAGENET Large Scale Visual Recognition Challenge (ILSVRC). The DAVE-2 system offered PilotNet to scale up the original DAVE's subscale deployment. They employed a CNN model in their research, which accepts just pictures as input and produces steering angles as output.
Bojarski et al. [10] illustrated how deep neural networks and E2E learning may be utilized to drive a vehicle. Their method produced promising outcomes and gave a clear description of how end-to-end learning works. This paper, on This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.  [11] suggested an angle branched network technique for E2E learning. As the angle branching network makes predictions based on the sub-goal angle, this strategy assisted to produce not only the steering angle but also the throttle of the automobile. The navigation instruction expressed as a one-hot vector includes less information than the sub-goal angle. They discovered that the sub-goal angle improves the driving model's performance by improving prediction quality for the steering angle.
Wu et al. [12] presented an E2E driving model based on a Convolutional Long Short-Term Memory (Conv-LSTM) neural network with a Multi-scale Spatiotemporal Integration (MSI) module for encoding spatiotemporal input from diverse scales for steering angle prediction. They used future sequential information in the model training process to improve the spatiotemporal characteristics of the driving model for better steering angle prediction. The performance of the suggested driving model was assessed using both public Udacity data and a real-time autonomous vehicle. Despite the originality of the concept, more work is needed for smoother steering control and visualization of the proposed method.
The common process between the research papers presented is that they expect all scenarios needed for data collection, collect all data at once, then train a Neural Network (NN). In our approach, we collect the bare minimum of the required data to start training and have an automated controller drive a vehicle to collect further data in an incremental way.

B. DATA COLLECTION
Data collection necessitates a significant amount of human labor since E2E autonomous lateral control relies on high-quality driving data for deep neural network designs [13] [14]. Researchers in [15] explains the two phases of the data collections process using a simulator. First, they spent one month practicing the driving skills to ensure they get high quality data. The second phase also required around one month, where they run a car for 60 hours on Udacity simulator resulting in 83,424 images for model training. In [16], CARSIM simulator was used to collecting data. A human driver collected data using a joystick wheel to use it for the training step. They used 15 minutes driving video resulting in 10,800 training samples.
It is clear how much time is needed to collect a decent amount of data for autonomous driving applications. In [17], Zhang used TORCS simulator for data collection, and he collected around 390,000 samples and selected 70,000 training examples. This must have taken a lot of time. Similarly, [18], Gao collected 140,000 training images using CARLA simulator. Table 1 shows a comparison between the traditional method, used by the above-mentioned researches, for data collection and our method for data collection. Our approach will be discussed in detail in the next sections. Due to the amount of effort needed for data collection, many researchers tend to use public datasets. For example, in [19], they used The Berkeley DeepDrive Video dataset (BDDV), that contains videos, photos, and GPS coordinates, with boundary boxes, lane markers, and semantic labels added to the datasets [14].
There are several noteworthy public datasets. Hesai and Scale's PandaSet contains photos as well as LiDAR information with bounding boxes and semantic labeling [20]. Waymo also offers an open dataset containing bounding boxes and semantic segmentations for pictures and LiDAR sensor data. Lyft provides open datasets created with LiDAR and camera sensors [21]. The nuScenes dataset was created with the use of LiDAR, radars, cameras, an IMU, a GPS, and annotated bounding boxes [22]. Most businesses working on autonomous cars attempt to collect as much driving data as possible since excellent datasets are critical for the development of autonomous driving systems. However, the cost of gathering and categorizing datasets is high due to the need for manual labor. This prevents most autonomous driving research communities, particularly those in academia, from gathering any form of driving data.

III. METHOD
A training data for a BC can be defined as D n = {o n (i), a n (i)} T i=1 , where o n (i) is an observation at time i, a n (i) is its corresponding action, and T is the total timesteps. Since the proposed method uses multiple training data sets, the superscript n indicates a data identification. Learning from an expert's demonstration (o n → policy : π n θ → a n ) can be depicted as in Fig. 3. The policy π n θ can be found by a Deep Neural Network (DNN) training through minimizing the prediction error of a n (i) when an observation o n (i) is given.
Unlike other E2E learning methods discussed in the previous section, our proposed approach to achieve BC through E2E learning has two phases. In the first and initial phase, an expert human driver collects the minimum driving data required for training: Then we train a DNN model to learn the policy π 0 θ from the expert's demonstration. Searching the policy π 0 θ means learning how to map observations to actions (i.e., images to steering angles). After having the policy π 0 θ through a DNN training with D 0 , a human driver is no longer in the loop. The distinctive point of our approach is from this point. The policy π 0 θ is to be deployed an automated driving controller, an AI chauffeur who will drive a vehicle to collect further driving data.
Once the policy π 0 θ is determined by a DNN training with D 0 , the second and main phase starts. This main phase is a repeated process of collecting more data by the AI chauffeur then training another DNN model for a new and more experienced AI chauffeur. The AI chauffeur (n = {1, ..., N }, where N is the total number of data collections by an AI chauffeur) 2 This is declared in the paper in general with no specifics on the exact time needed for collecting 83,424 samples.

Policy:
π n θ o n (1) , a n (1) o n (2) , a n (2) o n (T) , a n (T) Learning from expert's demonstration. At each time step i, the DNN model receives data from the driver in observation o n (i) and action a n (i) pairs, resulting in the discovery of a policy π n θ that assists in the mapping of observations to actions (o n → policy : π n θ → a n ).
will use the policy π n−1 θ . n = 0 indicates the human driver. Fig. 4 shows how AI drivers collect more data to increase the overall accuracy of the system. The firstly trained DNN is deployed to the 1 st AI chauffeur which uses the learned policy π 0 θ to drive a vehicle to collect more data and use the newly collected data to train a DNN; π 1 θ that will be deployed to the 2 nd AI chauffeur, and so on. The Data Collection Module in Fig. 4 represents the data collecting process of an AI chauffeur. We begin with the initial observation o n (1) from the initial state s n (1) and the known policy π n−1 θ . The initial observation leads to the first action a n (1), which determines the second state s n (2); the second state leads to the second observation o n (2), which leads to the second action a n (2), and so on. Fig. 5 shows the high-level system overview. To begin, a human driver drives a virtual car and captures only the bare minimum of data needed to train a neural network to drive a short distance by replicating the human driver's behavior. A human driver is no longer in the loop at this time. After that, a driving controller employs the trained neural network. We gave it the name of an AI chauffeur who is driving a simulated vehicle. We recorded driving data for a little longer as the AI chauffeur drove the vehicle. After training a new neural network with the newly obtained data, the AI's brain is transplanted with a new neural network in the hopes of improving the AI's driving performance next time. This process may be repeated iteratively until we have a neural network that can reliably predict steering angles enough to control the lateral motions of the simulated vehicle.
To implement and test our approach, we have been developing an open-source platform, OSCAR (Open-Source robotic Car Architecture for Research and education) [23]. The OSCAR platform is an open-source and full-stack robotic car architecture designed to enhance robotic research and teaching in the context of self-driving cars. Fig. 6 shows the hierarchical and modular structure of OSCAR. The Robot Operating Systems (ROS) [24] is the backend system of OS-CAR. The OSCAR platform supports two vehicles: fusion and rover. The fusion is largely based on car_demo from OSRF that was originally developed to test simulated Toyota Prius energy efficiency [25]. The backend system of o n (1) a n (1) s n (1) p(s n (2) | s n (1) , a n (1) ) π n-1 θ (a n (1) | o n (1) ) o n (2) a n (2) s n (

Data Collection Module (π n-1 θ , n)
π n-1 θ (a n (T) | o n (T) ) π n-1 θ (a n (2) | o n (2) ) : Concatenate data π n-1 θ : policy learned from "n-1" driver, n: the current chauffeur ID (a) Data Collection Module represents the data collecting process of each AI chauffeur. Assuming that n is the AI chauffeur's unique ID and n = 0 means the initial human driver, we begin with the initial observation o n (1) from the initial state s n (1) and the known policy π n−1 θ . The initial observation leads to the first action a n (1), which aids in the finding of the second state s n (2); the second state leads to the second observation o n (2), which leads to the second action a n (2), and so on. (b) The repeated process of training a neural network while collecting more data. The human driver data used to train a DNN to find the policy π 0 θ . The trained DNN is the 1 st AI chauffeur which will use the learned policy π 0 θ to collect more data and use it to train the 2 nd AI chauffeur, and so on.
the rover is the PX4 Autopilot with ROS communicating with PX4 running on hardware or on the Gazebo simulator. In this paper, we use the fusion vehicle of the OSCAR.

A. VEHICLE DESIGN
The chassis of the car is based on the Ford Fusion model. Three cameras are mounted on the front windshield. In this article, we solely used the front camera. Ouster's 64 channel 3D LiDAR [26] is mounted to the top of the windshield. We did not employ the LiDAR sensor for this paper. The simulated car is built as a plugin within the fusion ROS package. The vehicle's parameters, such as maximum speed, braking torques, and maximum steering angles, may be adjusted inside the URDF (Universal Robotic Description Format) model. Fig. 7 shows the appearance of the vehicle and sensor readings.

B. DATA COLLECTION
A simulated vehicle in the OSCAR platform sends its current velocity and position in a ROS topic named /base_pose_ground_truth. The current steering angle and throttle position of the vehicle is being sent through a ROS topic named /funsion. The front camera is used to collect data in this paper. The topic name of the camera image is /fusion/front_camera/image_raw. The image message must be converted to be saved as an image file. This can be done by cv_bridge [27]. Fig. 8 shows the high-level view of the data collection system.
The steering angle range is 0 to 1 (center to left) and 0 to −1 (center to right). A driving steering wheel has 450 • maximum rotation angle to the left and right direction each. This means 0 to 1 in the steering angle from the driving steering wheel is assigned to 0 to 450 • and 0 to −1 is assigned to 0 to −450 degrees. We used a Logitech G920 dual-motor feedback driving force racing wheel with pedals and a gear shifter mounted on a stand (see Fig. 9).
The simulated camera has a resolution of 800 × 800. We chose the Region of Interest (ROI) as (0, 380) -(799, 530), which includes the road section front of the car omitting the sky and the front hood of the ego vehicle. The ROI will be utilized in the training of a neural network. A camera input image's ROI will be clipped (800×151), shrunk to 160×160, and fed into the neural network. See Fig. 10 for more details.

C. NEURAL NETWORK ARCHITECTURE
We designed a convolutional neural network and named BIMINet (Bio-Inspired Machine Intelligence Network) shown in Fig. 11. The architecture of the network is inspired by PilotNet [5] where no maximum pooling layers were used. There are five convolutional layers and five fully connected layers. The first three convoluted layers have 5 × 5 filters with 2 × 2 strides. The last two convoluted layers have 3 × 3

III IV
.... The data we collected can be minimal just enough to train a neural network to be able to drive a short distance where a few straight roads and some curved roads exist. (c) neural network training system. The camera input data is fed to the neural network training module expecting to infer a steering angle associated with the input. (d) The trained neural network is deployed to the AI chauffeur. The AI chauffeur drives the vehicle as the data is being collected. The initial performance of the driving cannot be expected to be good, but the AI chauffeur will be able to drive a little bit farther compared to the previous step. After disconnecting the data flow II from a human driver, by continuing the loop,

II
, a little bit smarter neural network in the AI chauffeur will be replaced with the old one at every single loop. This incremental training and driving will eventually make the AI chauffeur's neural network be better and better in driving.

1) Training
To improve the trainability of the proposed network, we used three data preprocessing steps before using the data for training.   First, data normalization was carried out. It should be noted that this is not image normalization. Because most driving circumstances involve long stretches of straight road, the data collecting module will acquire substantially more data near-zero steering angle. We must normalize the data before feeding it to the training neural network by deleting part of the data with near-zero steering degrees. Otherwise, unless well-equally distributed datasets are used, the neural network may struggle to establish the relationship between input pictures and steering angles. We established a criterion of 200 as the maximum number of data samples that might be collected. We also applied some pixel-based preprocessing.
The min-max normalization is applied to the input images. Fig. 12 shows an example of data normalization. We used brightness changes and affine transformation to expand the quantity of training data and produce additional variations in the datasets. The brightness of the original image was changed within the range of 50% darker and 15% brighter. When shifting occurs through affine transformation, the accompanying steering angle must be re-adjusted as well.
We modify three pixels vertically and a reasonably huge 20 pixels horizontally left or right. Maximum ±0.2 shift around the x-axis will be applied to the steering angle. For example, if a warp affine occurs 20 pixels to the right, the steering angle will change 90 • to the right.

c: Steering Angle Scaling
To improve the accuracy of steering angle prediction, we can map −1 to 1 values to values between −450 • and 450 • .
In an actual driving condition, just a small percentage of the steering angle data will be recorded. For this work, we used a scale factor of 5. Steering angles are scaled using VOLUME 4, 2016 a predetermined factor during neural network training and testing.

d: Minimal Manual Editing
Minimal manual editing is necessary at each iteration to remove poor behaviors of autonomously driven data, such as crossing left or right lanes. Identify the times when the car crashed or displayed abnormal behavior, and then remove the photographs from the acquired data. rebuild_csv.py program that will automatically generate a CSV based on the photographs in the supplied directory can be used to reconstruct the metadata file.

e: Neural Network Training
The BIMINet CNN receives the preprocessed data. Fig. 13 depicts the whole training procedure. Before being fed into the neural network, the recorded driving data is normalized. The labeled string angle is utilized to decrease prediction errors by backpropagating to the network the difference between the desired and predicted steering angle. The training process is repeated until the validation loss does not reduce after three epochs.

2) AI Chauffeur
An AI chauffeur was built as a ROS node with a basic speed control technique. There is no direct control over the vehicle's speed in the simulator. Only the throttle and brake may be used to regulate the vehicle's speed. If we do not activate the brake, the car will eventually achieve the maximum speed since the base throttle is set to the maximum. We can picture a driver pressing the gas pedal all the way down and staying there all the while. To control longitudinal acceleration, we employed two indicators: predicted steering angle and current velocity. If the predicted steering angle is more than 0.25 (112.5 • ), there may be a risky bend or sharp turns ahead. When the car maintains its present speed, there is a substantial risk of going off the road. The AI chauffeur applies the brake for 1.7 seconds at 50% capacity and decreases the throttle to 10% of its maximum setting. In addition, because the training and testing tracks had twisting roads, the maximum velocity was set at 20. We present a new metric for comparing driving performance fairly. AI chauffeurs can effectively drive the car around the entire track. However, some of them may finish the circuit with several needless lane changes and/or offroading. To distinguish reckless driving from acceptable driving performance, we employed the autonomy meter, which assigns temporal penalties to undesirable actions such as lane violations.
where W inf is the number of white lane infringements, P w is the penalty for W inf in seconds, Y inf is the number of yellow lane infringements, P y is the penalty for Y inf in seconds, and T is the total travel time in seconds. We used 5 for P w and 10 for P y . This metric is inspired by the E2E learning for lane-keeping [5].

3) Steering Angle Prediction in Driving
We also provide a tool by which a sequence of images are generated with information such as input image file name, labeled steering angle, predicted steering angle, the absolute value of the prediction error, vehicle velocity and position, and degree angles. Fig. 14 shows a screenshot of the data visualization.

IV. EXPERIMENTAL RESULTS
To assess the suggested strategy, we used the OSCAR project, which allowed us to simulate the vehicle, sensors, and environments. OSCAR is made up of ROS packages and a convolutional neural network training/testing framework with handy utility modules that allow us to test the performance of trained neural networks, visualize activation maps, show traveled maps with distance, and create driving films from gathered datasets.
The system information where all data collection, training, and testing were taking place is as follows. The computer is Dell OptiPlex-7040 with Intel i7-6700 CPU 3.40 GHz. The size of the main memory is 32 GiB. The graphics card is NVIDIA's GeForce GTX 745/PCLe/SSE2. The operating system is Ubuntu 18.04.5 LTS. CUDA 9 with cudnn 7.1.2.
The findings of the experiments, as well as data and other material, are available online 3 .

A. TRACK DESIGN
We designed two different tracks in the Gazebo world format [28] (Track A and B). Track A was utilized to collect data for neural network training. The original track design came from Dataspeed ADAS kit Gazebo/ROS [29] simulator, which includes a track for a lane-keeping demonstration. The track is created with modular road segment models that include straight road segments of 50, 100, and 200 meters and radius curve road portions of 50 and 100 meters. We embellished the track with roadside objects such as gas stations, residences, and major architectural complexes. This is important to give some variations in the roadsides to make the lane-keeping task more practical and realistic (See Fig. 15). In this paper, all data collection tasks were conducted for training in Track A.
To appropriately assess a trained neural network, a new track must be utilized in the testing phase since the neural network may recall the track where it was trained. Track B shown in Fig. 16 was created to give a different environment for testing purposes. This testing track has more curving roads with varying curvatures, a variety of straight roads, and more and larger architectural structures.

B. IMPLEMENTATION
Using the Gazebo simulator, we created ROS packages to facilitate communication between simulated vehicles and surroundings. Gazebo's ROS integration enabled the simulation of sensors and vehicle dynamics. Two additional ROS nodes were implemented: data_collection and run_neural. The data_collection node subscribes to three ROS topics to acquire driving data from the simulated sensors and vehicle dynamics. The run_neural is an AI chauffeur by using a trained neural network to drive the simulated vehicle by publishing the ROS topic of the vehicle control such as steering, gear shift, throttle, and brake. ROS Melodic with Gazebo 9 was used for this paper. We chose the Anaconda individual edition [30], which includes data science and machine learning tools as well as package management, to increase the repeatability of our work. A package environment in one system may be easily reproduced in another using Anaconda. This considerably improves a work's reproducibility.
We used Keras 2.2.5 with TensorFlow GPU 1.12 as a backend system to implement the BIMINet CNN. All Python libraries that are necessary for this paper can be installed by an Anaconda environment file in the main folder of the OSCAR project, oscar/config/conda/environment.yaml so that the environment can be easily created and reproduce our work.
We alternated data collection and training to evaluate the proposed method.

C. DATA COLLECTION
We began by gathering one minute of data from a human driver's driving. A total of 1, 061 camera-input photos were recorded from a human driver, as well as additional synchronized driving data with camera input such as time, throttle, velocity, and position at each point when a camera image could be received. The data collection frequency ranges between 14 and 17 Hz.
Then, using the human driving data, we trained a neural network. The steering angle predictions' Mean Square Errors (MSE) reduced until epoch 11 (See Fig. 18 (a)). The saturated VOLUME 4, 2016 MSE value is about 0.05, implying an average error of 0.224. Because we selected 5 as a scale factor, the real inaccuracy is 0.0448, which translates to around 20 • of steering angle rotation. A 10 • turn of the steering wheel (essentially a joystick) has no discernible effect on the steering angle. As a consequence, given that just a tiny dataset was utilized in the first attempt, we may interpret the outcome as not too bad. However, we must assume that steering angle prediction may not be accurate enough to manage some key places, such as strongly curved roadways.
In all, 31, 103 rows of driving data were gathered, 3, 219 data rows were removed due to the AI chauffeur's bad behavior in data collection, and 27, 884 data rows were used. The number of data items from the human driver is 1, 061. Since the data collection frequency is about 14 to 17 Hz, it took roughly one minute to acquire the initial data to start training a neural network. Only 3.81% (1, 061 out of 27, 884) of the total data came from a human driver. The AI chauffeur was used to collect seven data sets in an incremental way after one minute of human driving. See Fig. 17 for more details.
In the trial when we attempted to collect two-minute data, the AI chauffeur could only drive for one minute and 42 seconds (labeled 1m42s). After eliminating 140 data rows that displayed abnormal behavior, we utilized the 1m42s data to train a neural network again to gather two-minute data. The newly trained network was then deployed to the AI chauffeur, who could drive for two minutes. We repeated this alternation between data collection and training with the collected data without a human driver in the loop. Up to eight minutes, data collection was conducted because it takes about eight minutes to completely drive the road track for training.

D. TRAINING AND TESTING IN TRACK A
For predicting steering angles, we employed Mean Square Error (MSE) as the training loss function. In the gradient descent, the Adam optimizer with a learning rate of 0.001 was utilized. The training to validation ratio was 7 : 3. The batch size was sixteen. We implemented a callback that monitors validation loss at an early stage. If it falls in three consecutive epochs, the training is terminated. In each batch, the rows of input data were shuffled.
The training graphs are shown in Fig. 18 where we can see training in eight cases was successfully conducted. Just note that the scale of the y axis is different from each training. As an example, Fig. 18 (a) appears to be better than Fig. 18 (h), but the scale in (a) is tenfold of (h). Fig. 19 is the scatter plots for ground truth values vs. predictions for steering angles. In general, all graphs show highly positive correlations between labels and predictions so that we can tell that all trained neural networks have pretty decent prediction performance. If we take a look at the graphs more carefully, we can tell (g) and (h) are better than (a) -(e) in Fig. 19 since the narrower plot shows smaller prediction errors. The results are reasonable considering that longer quality datasets are better for training a neural network.
Actual steering angle values from the neural network predictions are shown in Fig. 20 with ground truth values. As we can see in the graph, only small errors are reported. Fig. 21 shows that MAE decreases as longer minute datasets were used.
We also report the traveled distance of the vehicle on Track A map during data collection in Fig. 22.
As Fig. 18 -Fig. 22 show, we were able to train a neural network for the whole track with only one minute of driving data from a human driver. To summarize, the procedures are as follows. The human driver was removed from the loop after we gathered the initial training data. The first trained neural network was dispatched to be utilized as a driving controller for an AI chauffeur who could drive for one minute and 42 seconds while we collected the AI chauffeur's driving data. We trained a neural network labeled 1m42s on the obtained data from the AI chauffeur and delivered it to an AI chauffeur to drive a two-minute trip while collecting chauffeur driving data. The obtained two-minute driving data was used to train a new neural network. These two phases, data collection and training a new neural network, were repeated until the AI chauffeur could drive the whole track in Track A.

E. TESTING IN TRACK B
We created another track for testing to see if the proposed approach could be driven in a new track (Track B). This  track contains a variety of curves with a variety of ambient elements to verify trained neural networks' ability to predict steering angles correctly even on a different course. We designed two different driving conditions for testing. The detailed values of driving strategies can be found in Table 2. Under Condition 1, we had the AI chauffeur who has the 8m neural network to drive Track B. The AI chauffeur was able to drive the vehicle on Track B for more than 70% of the track even though the AI chauffeur  has not seen the track before. Then we conducted further experiments on all other eight cases just for future references. Fig. 23 shows the distances that each AI chauffeur with a corresponding different neural network controller drove. Fig. 24 and Table 3 present the completion rate with the autonomy metric. It is no wonder that earlier neural networks show lower performance on the new track. Then, on Track B, we tested the trained AI chauffeurs under Condition 2. With this new condition, the AI chauffeurs will drive a little bit slower and brake a little bit harder. The AI chauffeur drove with the 8m neural network almost perfectly under Condition 2. We also conducted eight other cases just for the purpose of future references. The AI chauffeurs drove perfectly with 6m and 7m neural networks as well. See Fig. 25 and Fig. 26 with Table 4 for more details.

F. ACTIVATION MAPS
We overlay activation maps on the final convolutional layer to determine which component of the input had the most influence on steering angle prediction. In Fig. 27, the left figure in each label is a camera image input and the right is   Interestingly up to five-minute data (d)-(f), neural networks learned the shapes of the roads is more important than the road itself. This can be a reasonable choice for neural networks to minimize MSEs of predictions due to the lack of varieties on both sides of the road. As more datasets were fed to the neural network training (g)-(i), the neural networks have to change the strategy to minimize MSEs due to different objects residing on the left and/or right side of the road that are not supposed to affect the steering angle predictions. the corresponding activation map on the region of interest. The activation uses the blue-white-red diverging colormap meaning dark blue is a small activation value and dark red is a high activation value. Fig. 27 (a) demonstrates low overall confidence but some activations on the road's edges. Surprisingly, up to five-minute data Fig. 27 (d)-(f), neural networks learned that the forms of off-road vehicles are more essential than the routes themselves. In the absence of variety on both sides of the road, this can be a suitable approach for neural networks to reduce MSEs of predictions. As more datasets were input into the neural network training (g)-(i), the neural networks had to adjust their approach to reduce MSEs caused by various items on the left and/or right side of the road that were not expected to impact steering angle predictions.

V. CONCLUSION
In this paper, we propose a unique BC process in which data collecting and neural network training are done incrementally with little initial data from a human driver. Only 3.81% (1,061 out of 27,884) of the total data came from a human driver. The AI chauffeur was used to collect seven data sets in an incremental way after approximately one minute of human driving. The training graphs showed that training in eight cases was successfully conducted. The scatter plots for ground truth values contrasted to predictions for steering angles showed highly positive correlations between labels and predictions so that we can tell that all trained neural networks have pretty decent prediction performance. Extensive testing using two separate tracks and two different driving circumstances successfully validated the proposed method.