Test Your Self-Driving Algorithm: An Overview of Publicly Available Driving Datasets and Virtual Testing Environments

Many companies aim for delivering systems for autonomous driving reaching out for SAE Level 5. As these systems run much more complex software than typical premium cars of today, a thorough testing strategy is needed. Early prototyping of such systems can be supported using recorded data from on-board and surrounding sensors as long as open-loop testing is applicable; later, though, closed-loop testing is necessary—either by testing on the real vehicle or by using a virtual testing environment. This paper is a substantial extension of our work presented at the 2017 IEEE International Conference on Intelligent Transportation Systems (ITSC) that was surveying the area of publicly available driving datasets. Our previous results are extended by additional datasets and complemented with a summary of publicly available virtual testing environments to support closed-loop testing. As such, a steadily growing number of 37 datasets for open-loop testing and 22 virtual testing environments for closed-loop testing have been surveyed in detailed. Thus, conducting research toward autonomous driving is significantly supported from complementary community efforts: A growing number of publicly accessible datasets allow for experiments with perception approaches or training and testing machine-learning-based algorithms, while virtual testing environments enable end-to-end simulations.

Abstract-Many companies aim for delivering systems for autonomous driving reaching out for SAE Level 5. As these systems run much more complex software than typical premium cars of today, a thorough testing strategy is needed. Early prototyping of such systems can be supported using recorded data from onboard and surrounding sensors as long as open-loop testing is applicable; later, though, closed-loop testing is necessary-either by testing on the real vehicle or by using a virtual testing environment. This paper is a substantial extension of our work presented at the 2017 IEEE International Conference on Intelligent Transportation Systems (ITSC) that was surveying the area of publicly available driving datasets. Our previous results are extended by additional datasets and complemented with a summary of publicly available virtual testing environments to support closed-loop testing. As such, a steadily growing number of 37

datasets for open-loop testing and 22 virtual testing environments for closedloop testing have been surveyed in detailed. Thus, conducting research toward autonomous driving is significantly supported from complementary community efforts: A growing number of publicly accessible datasets allow for experiments with perception approaches or training and testing machine-learning-based algorithms, while virtual testing environments enable end-to-end simulations.
Index Terms-Driving dataset, virtual testing environment, simulation, self-driving vehicle, autonomous driving.

I. INTRODUCTION
V EHICLES with self-driving functionality are currently entering the product portfolio of all major automotive original equipment manufacturers (OEMs). In addition, a growing number of start-ups around the world are aiming at delivering solutions towards SAE Level 5 functionality. These vehicles will substantially change the way how people will access and use mobility solutions in the future; in addition, this change in the way how mobility is consumed will also re-shape how metropolitan regions will be designed to allow for a better and more sustainable co-existence of various mobility solutions like bicycles, electric motorcycles, cars, supply vehicles, trucks, or public transportation.
The algorithms that are needed to realize autonomously acting mobility solutions are becoming increasingly complex as SAE Level 5 vehicles need to be able to act safely in any traffic situation without the need for a human driver. Therefore, careful testing and thorough evaluation of the individual software units that comprise a self-driving vehicle is mandatory including the use of open-loop stimuli from recordings to include realistic situations or for training and testing machine-learning (ML)based algorithms. Complementary thereto, closed-loop testing using virtual testing environments is needed to enable end-toend validation of both, individual software units as well as the complete data processing chain. Finally, new functionality is validated in prototypical vehicle platforms that are specifically instrumented to conduct measurements for systematic analysis of a functionality's behavior in real-world settings.
This article is a substantially extended version of [1] "When to Use What Data Set for Your Self-Driving Car Algorithm: An Overview of Publicly Available Driving Datasets". In contrast to our previous work, the main differences concern: (a) The presentation of publicly available datasets was updated to also include additional ten recently published datasets; (b) the description of the individual datasets was extended to include typical application scenarios for users; and (c) we complemented our previous work by additionally surveying the area of virtual testing approaches using simulations. Thereby, this article covers both, open-loop and closed-loop approaches for evaluating algorithms in the area of self-driving vehicles.

A. Background
While the work in the area of self-driving vehicle functionality dates back to 1939 at the World Fair in New York where GM was outlining a vision towards vehicles with no human intervention, only seven decades later in 2007, the first large-scale demonstration of several autonomously driving vehicles in an urban-like environment was conducted, known as the DARPA Urban Challenge (cf. [2]). Only today, the necessary perception technology, computational performance, and algorithmic approaches seem to be available to let the vision become reality. This work is licensed under a Creative Commons Attribution 3.0 License. For more information, see http://creativecommons.org/licenses/by/3.0/ Compared to auto-pilot systems for commercial flights, automotive systems are much more complex as they have to cope with a large variety of complex traffic situations with potentially very unpredictable traffic participants. Thorough testing of such systems is then continuously necessary to cover all possible traffic scenarios. The complexity is also illustrated by the SAE classification for autonomous driving where the two topmost levels require handling of all or nearly all possible traffic scenarios even when no human driver is on-board as fall-back (Level 3 or higher). Typical vehicle testing of today as reported in [3] also includes prototypical platforms specifically instrumented for measurements; these vehicles provide the real dynamics and physical characteristics that are difficult to model accurately in pure virtual testing approaches as correct physical behavior is essential for ultimately testing or tune a specific vehicle function.

B. Problem Domain & Motivation
The traditional approach to test said algorithms mainly involves the use of recorded data from the target platform. While recorded data has the highest degree of fidelity in terms of level of realism, it can only be used for open-loop testing to stimulate perception algorithms for example. Furthermore, such approaches gained a lot of attention recently with significant achievements in AI and ML requiring large amount of data for, e.g., end-to-end learning [4].
While designing, collecting, and labeling new data to evaluate algorithms for a self-driving vehicle is resource-intensive and time-consuming, and in some case especially weatherdependent, there are many datasets publicly available to the research community. The work presented here is supporting researchers and developers to get an overview of those datasets (e.g., what datasets are available, which sensors are included, and what situations are covered) to provide guidance during the selection of existing datasets.
However, systematic limitations such as usefulness for openloop testing only as well as of practical nature as time, weather, and vehicular recording platforms constrain the role of such data collecting approaches in practice. Therefore, complementary approaches that overcome the limitation of open-loop use on the one hand, and allow for scalability on the other hand are necessary to tackle the growing testing needs for autonomous driving systems on SAE Level 3 to 5.

C. Research Goal & Research Questions
The goal of this work is to present an extensive overview of publicly accessible datasets and instruments to support both open-loop and closed-loop testing resulting in the following two research questions: RQ-1 What datasets are available to support what type of testing for self-driving vehicular algorithms? 1 RQ-2 What virtual testing environments are available to support what type of closed-loop testing of self-driving vehicular algorithms?

D. Contributions
The work presented in this article is based on our previous work [1] presented at the 2017 IEEE International Conference on Intelligent Transportation Systems. We substantially updated the existing work by extending the coverage of publicly accessible datasets to 37 datasets in total and thus, include the most recently published datasets as guidance for selecting the right dataset for evaluating an algorithm.
Furthermore, we provide a survey of existing virtual testing environments to complement our existing work with approaches that enable researchers and developers to evaluate their algorithms in a closed-loop environment, i.e., receiving data from the system-under-test, adjust the simulated world accordingly, and derive the next stimulus for simulated sensors for the following time-step.

E. Scope & Limitations
The goal of our work was to carefully conduct a broad survey to provide an exhaustive overview about existing datasets and virtual testing environments supporting research and development of autonomous driving and algorithms. This work focuses particularly on datasets and virtual testing environments that could be identified using structured web searches and systematic snowballing and that are accessible to researchers and developers. For the datasets, we set our focus only on ground truth driving data collected on public roads with partial or full open access. For the virtual testing environments, we focused especially on solutions available as open source to encourage and facilitate contributions from the community. A deep analysis per data-set or virtual testing environment is very specific to particular use-cases of the development or evaluation of autonomous driving. Hence it would not contribute to the overall goal of our work but is rather suggested in specialized subsequent studies.

F. Structure of the Article
The rest of the article is structured as follows: Section II outlines relevant related work. In Section III, we outline the approach that we applied to survey the area to obtain information on publicly accessible datasets and virtual testing environments. Section IV presents and discusses the results of our findings. We conclude our work in Section V.

II. RELATED WORK
In the work by [5], results from a research collaboration with a large European automotive OEM in Germany are presented. The authors studied how consumer tests on the example of autonomous emergency braking (AEB) can be modeled for a virtual testing environment and massively scaled in an automated way to enable a broad range of testing according to state-ofthe-art test catalogs for this type of sensor-based systems. The findings demonstrated that generating hundreds of simulations with systematic variation of key parameters for the systemunder-test helps to unveil unexpected anomalies to be addressed before conducting tests on a real proving ground.
In the approach of [3], the authors report about a largescale interview study approaching scientists and industrial practitioners to explore the current state-of-the-art and future trends in the area of developing and testing active safety systems and systems aiming for self-driving functionality. The main finding for relevant future trends supported by feedback from both practitioners and researchers in the area, and relevant literature, is that the importance of virtual testing will significantly increase to improve test efficiency or even certifications. However, biggest remaining hurdles include level of fidelity of a virtual testing environment compared to data from real test runs and thus, missing clear benchmarking as well as better models for vehicle motion, sensors, and traffic situations.
In the study [6], it was demonstrated that both areas can be combined to improve the way systems are tested. It was studied how patterns from virtual testing data can be matched in data recordings from real sensors with the goal of finding interesting scenarios in reality for further analysis. However, instead of manual annotations for the datasets, matching scenarios were automatically identified.
Janai et al. [7] also recently surveyed a broad spectrum of datasets with a focus on computer vision in general. In contrast to our previous work [1] and to this substantially extended version, the datasets presented by Janai et al. are only limitedly applicable to autonomous driving.
On the other hand, virtual testing environments have been involved in a recent study [8], where various testing approaches have been classified and discussed in such virtual environments. In addition, this approach proposed a new testing framework in virtual testing environment, which combined different testing methods as a quantitative way to test the intelligence of an autonomous vehicle.
As a related work of combining dataset and virtual testing environment usage, [9] recently proposed a new routing algorithm for electric vehicles. The algorithm employed data mining techniques on a set of historical driving data, and eventually was tested and evaluated in virtual environment on dataset containing real vehicle state information as well.

A. Surveying the Area of Publicly Accessible Datasets
To exhaustively survey the area of publicly accessible datasets, we decided to use the search engine from Google. The main motivation and also shown by our results is that all datasets are accompanied by explanatory websites, where the original contributors of the datasets provide basic information about content and links to obtain the data; these websites can be easily found and indexed by search engines. To conduct our survey in a systematic way as also described in [1], we applied the following four sequential steps: 1) Initial Google search for exploration: Keywords such as "driving" and "dataset" were used in the Google search engine to initialize the exploration of most popular dataset web pages. Hence, we ranked the search results by relevance and only considered the top 200 results. We observed that most relevant dataset websites were found among the top 100 while nothing relevant was found after 150 results implying a low risk of missing relevant datasets.
2) Systematic extension by forward snowballing among dataset web pages: Some dataset web pages explicitly provide reference links to other relevant datasets, thereby pointing us to more datasets not covered before. 3) Systematic validation by collection accompanying scientific publications: A majority of datasets are supported by at least one scientific publication. We collected such publications related to the datasets discovered in the first two phases. 4) Systematic extension by backward snowballing using the publications: We went through the collected publications to identify new datasets referenced by the publications. This process went recursively until no more datasets were found. Throughout the four steps above, we selected datasets which satisfy the following inclusion criteria: r Data must be collected from on-board sensors on public roads. We exclude datasets with synthetic data from virtual worlds as this is addressed in our summary of virtual testing environments, data collected indoors, or in confined areas such as parks and campuses.
r The dataset must contain camera, LiDAR, or radar data.
It is insufficient to include GPS or Inertial Measurement Unit (IMU) data only.
r Full or partial open access.
Our previous survey [1] reported 27 datasets. This article includes ten more datasets (+37% more), most of which were released after our previous survey was published.

B. Surveying the Area of Virtual Testing Environments
Similarly, our methodology of finding existing virtual testing environments involves Google research, publication collection, and systematic snowballing. With the exploding development of intelligent vehicle functionality, driving safety, and AI/ML, a larger number of approaches in which virtual testbeds are involved can be observed. In addition, most of these approaches are accomplished by employing open-source virtual testing environments due to the consideration of availability and cost. Virtual testing environments were gathered as follows: 1) Direct Google and YouTube search: Keywords as "virtual environment" and "simulation" were similarly used in both search engines, as well as "vehicle", "traffic", "autonomous driving" and "download" for the purpose of filtering the results and also considering the accessibility issue. We observed that multiple video examples from different research groups or individuals could be found with the same keyword set, which would lead to excessively redundant results. This also implies low risk of missing relevant approaches during this phase. 2) Publication collection: A considerable number of approaches about autonomous driving, navigation, and other traffic-related research was initially gathered for virtual testing environments. We went through recent and relevant publications with focus on the simulations. 3) Snowballing among web pages of virtual testing environment: A majority of open-source projects of virtual testing environment provide an individual website, from which we collected references and other information. This process was considered as a complement for the phases above due to the fact that not all approaches in this field have open access publications or video illustrations. Several selection criteria were engaged during the survey of virtual testing environments: r Relevance to autonomous or intelligent driving, vehicle and traffic simulation, etc. Approaches of irrelevant virtual testing environment examples were therefore excluded, like simulation environments designed for network models, vehicle fabrication, light condition testing, or driver training.
r Accessibility. The environment or facility should be accessible either through open access or as commercial solutions. This excludes several in-house approaches that are only mentioned in conference and journal papers, or in commercial advertisements.

A. Overview of the Datasets
As an update of our previous overview [1], 37 datasets are listed below in alphabetic order with references, provider information, and highlights. A short alias based on the full name of each dataset is indicated in parentheses. In the rest of the article, the alias will be used whenever a specific dataset is referenced. The links have been verified on 2018-10-20.
r Dataset 1: Automotive multi-sensor dataset (AMUSE) [10] (https://goo.gl/1YbD5E) Provider: Linköping University, Sweden Highlight: Omnidirectional visual data for full surround sensing; include winter conditions with snow r  [17] (https://goo.gl/qLM3V4) Provider: Daimler AG R&D, Germany; Max Planck Institute for Informatics (MPI-IS), Germany; TU Darmstadt Visual Inference Group, Germany Highlight: Stereo sequences from 50 cities; pixel-level annotation for semantic urban scene understanding; benchmark suite with an evaluation server; the foundation for a new dataset, CityPersons [18], with better person annotations r  [28] (https://goo.gl/r5aRvv) Provider: Heidelberg Collaboratory for Image Processing, Ruprecht-Karls Universität Heidelberg, and Robert Bosch GmbH, Germany Highlight: A stereo and optical flow dataset with high accuracy for urban autonomous driving, containing a lot of manually constructed/acted scenarios on the same street r Dataset 22: Heidelberg benchmarks (Heidelberg) [29] (https://goo.gl/6c2lAs) Provider: Heidelberg University, Germany Highlight: Associated with an event called Robust Vision Challenge; provide challenging data for stereo and optical flow, e.g., rain flares and flying snow r  [38] (https://goo.gl/nJOQkq) Provider: Oxford University, UK Highlight: The first dataset stressing periodic long-term data collection (over a year) following predefined routes to cover long-term changes of road conditions r Dataset 32: Stanford track collection (Stanford) [39] (https://goo.gl/KNOYpX) Provider: Stanford University, US Highlight: Velodyne 64 point cloud with object labels and GPS/IMU data r Dataset 33: Ground Truth Stixel Dataset (Stixel) [40] (https://goo.gl/rf12z6) Provider: 6D-Vision, Germany Highlight: Heavy rain on highways; stixel annotation r Dataset 34: TorontoCity benchmark (TorontoCity) [41] (Link to be released soon as the provider promised in the paper) Provider: University of Toronto, Canada Highlight: Data with wide range of views for mapping, reconstruction and semantic labeling

B. Discussion of the Datasets
After conducting a thorough study of the 37 datasets above, we herein make an intuitive comparison and provide a dataset selection guideline from the following perspectives: (1) time and venue: when and where was the data collected? (2) traffic conditions during the data collection; (3) sensor setup: what sensors were used during the data collection? (4) data format and size; (5) provided resources (e.g., raw data, annotation, benchmark, source code, and tool support); (6) license; and (7) accessibility. The comparison result is summarized in Tables I-V, where each column is a comparison factor among (1)-(5) and each row is a dataset.
Since the dataset URLs, providers, and highlights have already been provided in Section IV-A, they do not reappear in these tables. Furthermore, many datasets are similar with regard to the last two comparison factors license and accessibility. Therefore, they are skipped in the tables and will be discussed separately.
The Time & Venue column in the summary tables indicates that most datasets were released after 2009. Moreover, there is a growing trend in running data collections, especially since 2016. In terms of venue, most data were collected in Europe and the US. Germany is the most active country running data collections. There are about ten datasets with data collected outside Europe and the US: CCSAD from Mexico, several sequences in ESATS from New Zealand, part of JAAD collected from Ukraine and Canada, TorontoCity from Canada, three from China, including one dataset in Daimler pedestrian, Apollo, and TRoM, part of MVD collected in South America, Asia, Africa and Oceania, some data collected in South Korea in KAIST (this is not stated anywhere but observed by us in some sequences), and finally part of nuScenes collected in Singapore. Thus, we strongly urge the future release of new datasets from other regions with wider geographical distribution as each country has its unique traffic conditions. An autonomous driving algorithm that has been successfully tested using a dataset from Germany may not work similarly well in other regions of the world. More global driving data is essential towards more robust autonomous vehicles whose performance is less sensitive to geographical location.
Traffic condition is another key factor to consider while selecting a dataset. We are particularly interested in the type of traffic (e.g., urban traffic, rural road, highway), light conditions (e.g., daylight or night), and weather conditions (e.g., sunny, overcast, rainy, other). Most datasets focus on urban traffic, daylight, and sunny weather. While perfect light and weather conditions are often favored for testing purposes, sometimes adverse conditions are more desired to increase the robustness of algorithms under test. Data with adverse conditions can be accessed from a number of datasets: AMUSE, CCSAD, CMU, Dr(eye)ve, ESATS, Elektra, Heidelberg, JAAD, Oxford, Stixel, HCI, and TRoM. We also observe a trend that new datasets such as BDDV and MVD focus on environmental and weather diversity, covering various traffic/weather/light conditions. The datasets exhibit a variety of sensor setups. The core sensors are camera, LiDAR, and GPS, which is often combined with IMU. 35 out 37 datasets include at least one type of camera except for Stanford and TrafficNet. Monocular cameras are more popular than stereo cameras, while the color option is slightly preferred to grayscale. Not too much attention has been given to omnidirectional cameras, which are only used in AMUSE and Ford. To our surprise, radar is used only in Apollo, Traf-ficNet, and nuScenes, even though radar has been widely used in modern vehicles for detecting objects. Our conjecture is that most automotive radars are commercial products with proprietary data formats that cannot be easily released publicly. Other types of sensors can also be observed in specific datasets, such as the monocular infrared camera in Cheddar Gorge, the eye tracker device to capture driver fixation in Dr(eye)ve, the far infrared sensor in Elektra, the airborne LiDAR in TorontoCity, and the thermal camera and beam splitter in KAIST. More advanced sensing devices including multiple types of sensors are found in HCI and TrafficNet, e.g., the mobile mapping system in HCI and Mobileye's vision-based ADAS in TrafficNet. An interest-ing phenomenon is that new datasets like BDDV and MVD start to apply a crowd-sourcing strategy, i.e., raw data is collected by external individuals instead of dedicated teams within the organization. This is an efficient way of enlarging the scale of a dataset. The challenge is how to make the collected data consistent in terms of format, size, and other aspects. We believe that this crowd-sourcing strategy is suitable for image/video collection, optionally with low precision GPS data, which can be captured by private mobile phones and low cost personal devices. It is impractical to use this strategy to collect more professional data such as LiDAR point cloud.
Data format is an important factor for dataset selection. In general, standard data formats are more favored than proprietary data formats because standard data formats are not restricted to use a specific software, thus allowing for more flexibility. Most datasets share data in standard formats. AMUSE, part of Elektra, Malaga, and Stanford contain own data formats and tools for parsing the data. The data format in Cheddar Gorge is still unclear because the only currently available resource for Cheddar Gorge is a scientific paper where its data format is not described; the data format of TorontoCity and EuroCity is also unclear.
We also investigated the data size of each collected dataset as shown in the Data format & size column in Table I-V. The data sizes of the 27 datasets reported in our previous survey [1] were obtained around April 2017, while the data sizes of the ten new datasets were obtained in 2018. Instead of updating the data sizes of these datasets over and over again, we do not aim to show the latest data size, which will probably change over time. The size of most datasets falls in the range of 1-100 GB. The Oxford robotcar dataset is currently the largest dataset with 23TB among the 37 datasets. The sizes of TorontoCity and EuroCity are unknown because their websites are still not released yet and sizes are not mentioned in the corresponding publications.
The most fundamental resource provided by a driving dataset is raw sensor data. If the data is collected from multiple sensors, all sensor data must be properly calibrated, synchronized and accompanied by timestamps. Among the 37 datasets, most of them provide much more than raw sensor data. The typical complementary resources are annotations and labels (e.g., object bounding boxes), benchmark suites, source code, toolkit, scientific publications and demo videos. The raw sensor data, in particular visual data, is often classified into training, validation, and test sets for different purposes. In addition to raw sensor data, benchmark is deemed an extremely rewarding and appreciated feature that serves as an open evaluation platform for performance comparison. Various benchmarks are available in Caltech, Cityscapes, German traffic sign, KITTI, Apollo, Toron-toCity, and HCI, where the performance of different algorithms submitted by the dataset users is ranked.
We have summarized the typical usage scenarios of the included 37 datasets, which are shown in the last column of Tables I-V. There are a variety of usage scenarios supported by these datasets such as optic flow and SLAM. The top two usage scenarios supported by most datasets are pedestrian/vehicle de-tection and semantic segmentation. The most popular datasets for pedestrian detection include Caltech, Daimler pedestrian, ETH pedestrian, and KITTI. The most popular datasets for semantic segmentation are probably Cityscapes and KITTI. The emerging MVD also starts to gain attention for semantic segmentation. MVD even surpasses Cityscapes in terms of density of object instances per image [37]. Some datasets have a dedicated purpose, e.g., Dr(eye)ve for driver attention analysis, German traffic sign for traffic sign detection, TRoM for road marking detection and classification, and TrafficNet for traffic scenario categorization. By contrast, some datasets like AMUSE, CC-SAD, Cheddar Gorge, Ford, and Malaga do not clearly reflect what usage scenarios they support. KITTI is undoubtedly the most outstanding dataset and benchmark with the most comprehensive coverage of usage scenarios. The most recently released Apollo and BDDV look also promising by virtue of their support for various usage scenarios.
Apart from these summary tables, we also investigated the license and accessibility of these datasets. Regarding the legal constraint of using these datasets, 11 datasets have declared the licenses under which they were published. Creative Commons Attribution-NonCommercial-ShareAlike 3.0 (CC-BY-NC-SA) is the most adopted license used by CMU, comma.ai, Karlsruhe labeled objects, Karlsruhe stereo, and KITTI. nuScenes is under the same license yet with version 4.0. Elektra and Oxford adopted for Creative Commons Attribution-NonCommercial  Although our dataset inclusion criteria require either partial or full open access, accessibility still varies a lot among these datasets. While most datasets allow convenient and direct data download, some datasets make data access more complicated to various degrees. Cityscapes, Elektra, Heidelberg, Dr(eye)ve, BDDV, MVD and nuScenes require a valid email address to obtain the download links. Cityscapes and MVD are relatively more stringent in the sense that a data user must register an account with work email as private email is not accepted, and any new registration will be manually inspected and it takes a few days to get it approved. Apollo provides sample data which can be accessed with mobile phone registration. Access to the entire dataset would require an online application where more detailed information and motivation must be indicated. TRoM does not have an official dataset web page, but instead shares the raw data on Baidu Cloud web disk with Chinese as its only language option. The publication of TRoM [43] states that a toolkit for road marking annotation is also available together with the dataset. However, this toolkit was not found on the web disk as investigated at the time of writing. TrafficNet classifies driving data into eight different scenarios; though, only six scenarios are reported in [42]. Nevertheless, only the two scenarios "lane change" and "car following" are available to the public, whereas the other six scenarios are only available to Mcity members. HCI requires a user to install the SDK toolkit provided on its website to download data. Meanwhile, part of HCI is used for the Stereo Geometry Challenge 2016, where data can be directly downloaded. However, since May 2018 we observed dramatic change of the HCI dataset website. Data for Stereo Geometry Challenge 2016 was no longer available. Instead, data for Robust Vision Challenge 2018 was provided upon registration by email. The data of TorontoCity was still not released at the time of writing, though its authors have promised that it will come soon. Cheddar Gorge can only be obtained by sending hard disks to the provider. EuroCity allows free download for non-commercial use as stated in [26], however, the website of EuroCity is not given yet by the time of writing (August 2018).
Despite the comprehensive summary and comparison of datasets, completeness is still a potential threat to validity. It is difficult to rule out the possibility of missing other existing relevant datasets; however, we adopted and applied a thorough and rigor approach to explore and create this overview of datasets. In addition, the factors for dataset comparison are defined on the basis of our expertise and experience conducting research and development for more than ten years in this area. Other dataset users may be interested in certain aspects or use cases of the datasets, which cannot be discussed in detail in this overview but would require specific subsequent studies.

C. Overview of the Virtual Testing Environments
Complementary to our previous work [1], we have added an overview of virtual testing environments to enable closed-loop testing during the development of algorithms for self-driving vehicles. 22 virtual testing environments are listed below in alphabetic order with references, links, and highlights. Names of the providers are also indicated in parentheses where applicable.
Before coming into the following discussion, we would like to comment that the datasets as listed in Tables I-V are typically providing video stream, single image files, or lists of characterseparated values (CSV) to be used for offline data processing (e.g, for training neural networks (NN)). Simulation and virtual testing environments in the following sections, though, typically serve use-cases where an algorithm is connected with the simu-lation system to get stimuli data from the simulation and to relay back its output into the simulation environment. Using a dataset from the aforementioned tables in simulation systems is usually not the intended use-cases and thus, this paper mainly focuses on the overview of the testing environments on use-case specified aspect. Nevertheless, the combination of the two ends is indeed practical and might be discussed and evaluated in detail in future work. The links have been verified on 2018-10-20.    Similar to Tables I-V, Table VI, where each row represents a virtual testing environment and each column corresponds to the perspectives (1)-(5) above. The "N/A" in this table denotes cases where related information is limited due to accessibility.
We believe such survey upon virtual testing environments by considering the perspectives listed above is of critical significance, especially in comparison with experiments on real vehicles. Virtual environments with easy accessibility enable to a large extent the progress of testing autonomous algorithms in a swift and safe sandbox before engaging the invalidated thus potentially dangerous situations in real driving scenarios [53]. Taking a step further, open-source virtual testing environments also outrun in reducing the facility cost (experimental vehicles, sensors, maintenance of testing field, etc.) to the minimum, while those with multi-platform and/or programming languages support a large range of choice for the researchers by reducing the barrier and unnecessary dependencies.
A majority of the virtual testing environments in our survey were, and still are, recently supported and updated. For most cases of the open source projects, a full list of update and release history is commonly available on the website or on GitHub platform respectively. We can observe that several virtual testing environments in our survey are outdated in terms of recent updates. These projects were mainly created and supported as part-time work by individual developers, nonetheless they still serve as alternative options in case a free, light-weighted virtual testing environment is in need. Simulation environments such as OpenDaVINCI, TORCS, ROS-based Gazebo, and V-Rep have relatively larger user groups, which leads to frequent and regular updates. On the other hand, commercial software and integrated solutions do not necessarily provide release history or previous versions to customers. Instead, it is more common to receive news about latest updates from their website, commercial mailing list, or RSS feeds in such case.
Accessibility is one of the major concerns during our survey. It is widely accepted that simulation platforms that aim at noncommercial, scientific research purpose should be approachable as open source under certain public licenses. Most common licenses that open source simulators follow are GPL, MIT, EPL, and Apache. In addition, a considerable amount of commercial software provides free trial version or educational license, e.g., the V-Rep robot simulator.
It is also crucial to note that quite a few virtual testing environments are unable to be cited in our survey because of accessibility issues. For instance, DeepDrive, which aimed at self-driving AI development based on virtual testing environment of the game GTA-V, was shut down for legal reasons. Also, many virtual testing environments only appeared respectively in papers of conferences and workshops, and thus, are not publicly accessible for peer researchers.
In terms of platforms or operating systems for the virtual testing environments, open source instances are naturally crossplatforms projects, as they tend to be developed on Linux, and in many cases Mac OS, FreeBSD and others. For example, OpenDaVINCI provides compilable source code for a variety of POSIX-compatible operating systems, and CARLA as a young project runs on Ubuntu only. Both open source and commercial software provide releases on Windows, and in latter cases mostly on Windows only.
The column of use cases contains examples of typical usage for each virtual testing environment respectively. Due to the limited accessibility, it is impossible to inspect each in detail. Nonetheless, we have compared the use cases according to their usage. We summarize the survey result in categories as illustrated in Table VII and include further examples of use cases of each environment that we surveyed.
The following is a summary of our experience upon different virtual testing environments and their fidelity. OpenDAVINCI leads the survey list with the highlight of not only providing a virtual testing environment for simulations, but also being a realtime middleware that has been demonstrated on several self-driving vehicles ranging from the 2007 DARPA Urban Challenge up to the 2016 Grand Cooperative Driving Challenge. VDrift and TORCS stand out as 3D environments with virtual vehicle and AI drivers, the latter of which outperforms by larger user groups and regular update. The recent approach of Microsoft AirSim enables autonomous vehicle simulation for the formal drone-dedicated simulator, which yields promising performance by dynamics modeling and Unreal rendering engine. The same engine is also involved in the CARLA project, which provides open digital assets for establishing an urban environment. Last but not least in the list of open source environments for autonomous driving simulation, ROS-based Gazebo stands out with support for robotics and vehicle dynamics.
On the other hand, commercial testing environments also hold a significant place for researchers and industrial users as integrated solution providers as they could adapt to the demands of specific purposes. Both SCANeR Studio (OKTAL) and DYNA4 (TESIS) offer customized training courses for better utilization of their products. The widely acknowledged and mature PELOPS has always been supporting industrial and technical research projects since several years. VTD (Vires) contributes and manages OpenDRIVE standard for the description of road networks in driving simulations.
In terms of virtual testing environments for traffic flow simulation, the open source project SUMO attracts a large number of users by its microscopic, multi-modal traffic characteristics, while PTV Vissum is also competitive by integrating active traffic management and geometry of road and intersection design. From the point of view of ADAS controller applications, ASM Traffic (dSpace) is typically designed for HIL testing of electronic control units (ECUs) or for early function validation by offline simulation.
The two hardware testing environments with virtual reality approach that are listed in our survey are Sim IV (VTI) and simulation environment of Hank Virtual Env. Lab. The simulator facility of VTI is dedicated to realistic and simultaneous simulation of lateral and longitudinal acceleration, as well as a wide synthetic forward field of vision. Bicycle/pedestrian simulators powered by Hank Lab enables direct perception and interaction between human users and virtual objects in the environment, thus creating a creditable platform for research of traffic safety.
Alongside the survey of particular use-cases that are provided by or accomplished on the virtual testing environments, we suppose that validity is also a significant aspect worth consideration in general. "How close is the modeled realism in the virtual simulation compared to the real world (level of fidelity)" has always been one of the major interests and in various cases, a threat to validity for all approaches related to virtual testing environments and simulation work on them [54]. Lighting conditions of synthetic scenarios are another widely recognized issue that limits the validity of virtual testing environments [55], especially for approaches that are highly related to computer vision and sensor simulation.

V. CONCLUSION & FUTURE WORK
The global race to develop, evaluate, and deploy algorithms and solutions to realize self-driving vehicles has significantly heated up -first solutions at SAE Level 3 are being made available to customers addressing automated driving on highways for example. The community around this comprises researchers, major automotive OEMs, as well as young start-ups. They all have in common that they need open-loop and closed-loop solutions to systematically develop, test, and evaluate their approaches. Especially for researchers and young start-ups, the threshold for contributing to the field is high as collecting own datasets is resource-consuming and time-intensive.
Our work presents a combined survey for publicly available datasets next to an overview of virtual testing environments to support the research, development, and evaluation of algorithms from the field of autonomous driving. We present 37 datasets from different perspectives such as included driving situations, sensor setups, data format; additionally, up to 22 virtual testing environments are presented to support the closed-loop testing. To the best of our knowledge, our work is the first and most comprehensive survey of this kind providing guidance about existing and publicly available assets to researchers and developers.
Future work in this area should evaluate what combination of publicly available datasets and virtual testing environments will result in the best fidelity level in terms of resulting performance in reality compared to what has been achieved with open-loop and closed-loop testing. Recent approaches combining dataset collection and virtual environment usage [56] and [57] have been observed encouraging future work. Furthermore, commonalities between existing datasets could be studied in greater details to find overlapping or complementary parts; in that regard, a standardized representation or encoding of all datasets could be proposed to enable a simplified comparison of a system-undertest using various datasets. Such standardized representation could also serve as joint interface to various test environments to enable better modularity and reuse. Furthermore, an empirical study involving many dataset users will help to identify the most crucial factors to be considered during dataset selection.