Virtually Augmented Radar Measurements With Hardware Radar Target Simulators for Machine Learning Applications

The acquisition of machine learning (ML) datasets by meas-urements for automotive radar data requires many resources and time. On the other hand, the simulation of complex traffic environments with sufficient level-of-detail is challenging, too. In this letter, a middle way is proposed in which real measurements are virtually augmented by reflections obtained from simulation models. The augmented measurements are then replayed with a hardware-based radar target simulator (RTS). This enables the fast creation of application-specific datasets, as well as advanced functionality tests of algorithms at deployment. To demonstrate the efficacy of the virtually augmented radar data, a set of test drives is augmented by virtual pedestrians performing traffic gestures. Then, a classifier trained on the created dataset is demonstrated to achieve a high classification accuracy of 84.0% on real test data.


I. INTRODUCTION
Driver assistance systems with increasingly autonomous functionalities require highly resolved sensor data and advanced scene interpretation abilities.While the quality of sensor data has continuously increased thanks to the rapid advances in sensor technology, the interpretation of the data remains challenging with classical signal processing schemes.To overcome the restrictions of classical signal processing, machine learning (ML) approaches are chosen to analyze the incoming radar data.The use of radar sensor technology increased greatly in the automotive industry over the years due to its robustness, low sensor costs, and other advantages, such as the possibility to measure velocities directly [1], [2].
ML has been applied on radar data for object and road user detection [3], [4], data segmentation [5], [6], or the classification of vulnerable road users [4] and their performed gestures [7].However, one of the biggest remaining challenges is the provision of sufficient and suitable training data.Gathering datasets by means of test drives requires a lot of time, causes high costs, and often leads to class imbalances.Approaches with simulation frameworks [8], [9] exist, but require an accurate sensor, target, and propagation model, which can become difficult to be set up for complex scenarios.Hence, this letter proposes virtually augmented measurements as a compromise between measurements only and simulations only, and applies these data exemplarily to the ML task of traffic gesture recognition.The approach leverages the realistic, detailed scene representation inherent Corresponding authors: Nicolai Kern; Pirmin Schoeder (e-mail: nicolai.kern@uni-ulm.de,pirmin.schoeder@uni-ulm.de).Associate Editor: W. Pu.Digital Object Identifier 10.1109/LSENS.2024.3359693 in real measurements, and the possibility to add realistic models of individual targets such as cars or pedestrians.Moreover, a hardwarebased radar target simulator (RTS) is chosen for the generation of the augmented radar data for a more realistic simulation.Radar target simulation is an emerging field that aims to close the gap between software simulations and physical test drives [10], [11], [12], [13].While software simulations fully rely on sensor models, a radar target simulation employs actual radar sensors, thus, eliminating the need to model the radar sensor.The basis of the presented approach is the replay of real measurements from target lists [12].Before replay, the target lists are augmented by reflections from virtual objects [13].Finally, a radar records the virtually augmented scene fed to the RTS.Besides dataset generation, the replay of arbitrarily augmented scenes enables a wide range of functionality tests for detection and classification algorithms.The potential of the approach is demonstrated by creating a large traffic gesture dataset from real test drives and a pedestrian simulation model [14].A gesture classifier is trained with the dataset and subsequently achieves a high classification accuracy on real test data.
The rest of this letter is organized as follows.In Section II, the RTS, the pedestrian simulation model, and the generation process of the new data are introduced.Afterwards, Section III introduces the experimental setup, followed by the ML results in Section IV.Finally, Section V concludes this letter.

A. Radar Target Simulation
State-of-the-art automotive radars employ a chirp-sequence frequency modulated waveform (CS-FMCW) [1], [2], [15].As shown in Fig. 1, a CS-FMCW radar transmits K chirps, where each chirp is a CW signal with linearly increasing frequency from the start frequency f 0 to the end frequency f 1 within the time ( The transmit signal is reflected from a target at the distance R, which generates a time delay.By mixing the delayed receive echo of a target with the transmit signal, the IF-signal of the radar for the kth chirp is The beat frequency f b is proportional to the distance R of the target and results in a range-dependent peak in the IF-signal's frequency spectrum.Moreover, the Doppler frequency f D is proportional to the target's velocity v and is expressed in a phase progression over the chirp index k. Hence, to simulate a target with range R and velocity v, the RTS has to receive the transmit signal, introduce a frequency shift f b , a phase progression over the chirps corresponding to f D , and retransmit the manipulated signal towards the radar.Both the frequency shift and the phase progression can be applied by mixing the radar signal with a single modulation signal with frequency f mod [16].The radar cross section (RCS) of the target is taken into account by adjusting the amplitude of the modulation signal.
In order to match the beamforming capabilities of automotive radars, a spatially distributed RTS is harnessed that allows the simulation of targets with range, velocity, RCS, and azimuth angle.The RTS consists of individual target elements (TEs) that are placed around the radar as in Fig. 2. Each of the TEs receives the radar transmit signal, modulates it, and sends the signal back towards the RuT.The modulation signal of a TE for targets is A ξ e j2π f mod,ξ t (5) where each f mod,ξ depends on the simulated target's range and velocity.At the mth receive antenna of the RuT, the superimposed signals of the N TEs cause an IF-signal of if the beat frequency shift due to the distance between the radar and the TEs is neglected.By tuning the complex-valued amplitude A n,ξ , the target can be simulated at a desired arbitrary angle [17], [18].

B. Pedestrian Simulation
The simulation of extended targets with the RTS requires their decomposition into a set of discrete point scatterers, each characterized by its range, velocity, RCS, and-with the spatially distributed RTS setup considered in this work-azimuth angle.For humans, a well-established method is the description of the human body by a set of ellipsoids [19], where the centers of these ellipsoids represents the point scatterers' positions.The RCS of the body parts is approximated by the RCS of an ellipsoid [20].Moreover, further scatterers can be added along the ellipsoid main axes to include velocity spread and shadowing effects due to self-occlusion can be taken into account [14].
In this letter, the approach from [14] is adapted.The body model is composed of 20 body parts, of which three represent parts of the hand to capture the highly descriptive subtleties of hand motion.An example of the ellipsoids is shown in the upper right image of Fig. 3.

C. Virtually Augmented Scenarios
With the presented RTS, it is possible to let an RuT record arbitrary combinations of scenarios and targets [13], as long as lists of detectable targets in the observed scene are available.One possible way to create a scenario is to replay real measurements by composing s mod (t ) of the targets actually detected by a CFAR algorithm [12].This is demonstrated in the left column in Fig. 3, where Doppler spectrograms of a real test drive and its replay with help of an RTS are shown.A second way is the generation of artificial target lists, e.g., from a simulation model.For example, in the right column of Fig. 3, a traffic gesture simulation as observed from a driving vehicle serves as input to the RTS.Importantly, the pedestrian position changes over time Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply. in accordance with the real driving path, such that the pedestrian's Doppler spectrogram exhibits the same velocity profile, superimposed by the motion components caused by gesturing.
For matched driving paths, it is possible to combine the real measurements with artificial reflections from target models, corresponding to a virtual augmentation of the real measurement.For the example of simulated traffic gestures, this is finally shown in the lower part of Fig. 3, where the modulation signal of the RTS contains both the radar responses of the scene, as well as the signature of the virtual gesturing pedestrian.Thus, to the radar a scene is presented as shown in Fig. 3.

III. EXPERIMENTAL SETUP
The virtually augmented radar data is applied to traffic gesture recognition to examine its efficacy in a real-world application.To this end, a dataset is generated following the principle in Fig. 3, with the augmented radar data containing real driving scenarios enriched by simulated traffic gestures.The dataset creation requires a database of real test drives as well as motion data of traffic gestures that animates the pedestrian simulation model.A data-driven approach is followed to assess the data quality.The virtually augmented radar data are applied to train a 10-layer convolutional neural network [21], which is subsequently tested on real traffic gesture measurements.The constituents for the experiments-the recorded driving scenarios, the motion data, and the real test data-are introduced in the following.

A. Measurement Setup
All real-world data are acquired with a radar sensor network of two CS-FMCW radar sensors mounted to a vehicle bumper.Both sensors operate at 77 GHz with a range and velocity resolution of R = 7.4 cm and v = 11.0 cm/s, respectively.The two sensors receive a common trigger but are recording independently otherwise.The scene replay with the RTS is performed with the same radar parameters.In order to replay the augmented test drives for both sensors, the two different modulation signals for both of the sensors are presented to a single RuT one after another, thus, emulating the radar sensor network.
With the sensors mounted to the vehicle, 38 test drives were conducted at several locations on the campus of Ulm University, which are the basis for the virtually augmented radar data.An exemplary driving environment is shown in Fig. 3.

B. Traffic Gesture Motion Data
The traffic gesture motion data are taken from prior work in [22] and has been recorded with a stereo camera system.In the experiments, seven gestures are considered for which reliable motion data are available, namely "Come Closer" (g 1 ), "Slow Down" (g 2 ), "Wave" (g 3 ), "Push Away" (g 4 ), "Wave Through" (g 5 ), "Start" (g 6 ), and "Stop" (g 7 ).
With the real driving measurements and the motion data, 930 random combinations of gesturing person and test drive are created, and the simulation procedure is carried out for both radar sensors.Target lists are computed for the new data recordings, and spatially-filtered and ego-motion-compensated Doppler spectrograms are created for the pedestrian in the scene [7].The spectrograms are subsampled into shorter snippets with a length of 2 s to finally obtain a dataset with 5411 samples for training.

C. Real Traffic Gesture Measurements
For testing purposes, real traffic gesture measurements from [7] are evaluated, which have been recorded while driving.The test dataset contains data from five participants that haven't been part of the motion data recording.In total, approx.1500 samples with a length of 2 s are available for testing, with sample mean ranges from 5 to 23 m.

IV. EXPERIMENTAL RESULTS
After training the classifier on the synthetic samples, its classification accuracy on the real traffic gesture measurements is examined for different mean distances between the vehicle and pedestrian during the samples' time span of 2 s.The resulting accuracy-versus-range curve is shown in Fig. 4. Within the first 10 m, 84.0% of the measured samples are classified correctly without the classifier ever seen a measured gesture.For larger distances, this value falls off, which can be explained from the distribution of the synthetic samples.Due to range limitations caused by the digital-to-analog converter's low-pass filter, which limits the highest frequency component of s mod (t ), distances larger than 16 m cannot be realized.Hence, due to the vehicle's ego-motion the majority of the samples in the training dataset ends at ranges R min well beyond Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.16 m, and the course of the accuracy exhibits a close correlation with the sample distribution over R min (cf.Fig. 4).
Moreover, the confusion matrix in Fig. 5 allows a more in-depth understanding of the results when training on virtual gestures only.Most notably, the confusion behavior well reflects the confusion clusters observed in real data [22], with increased confusion potential between highly similar gestures such as "Thank You" and "Stop."

V. CONCLUSION
In this letter, an approach for the virtual augmentation of radar measurements has been presented that aims at the provision of data for the training and validation of radar-based ML algorithms.To demonstrate the efficacy of the generated data, real-world test drives have been augmented with the help of an RTS by virtual gesturing pedestrians whose positions over time are adjusted to the vehicle's trajectory.The scenes replayed by the RTS have been recorded with a radar sensor, and the resulting traffic gesture dataset has been shown to provide a good basis for the classification of real traffic gesture recordings.Future work might take a closer look at the interplay between RTS and machine learning, for example in the case of strong effects induced by sensor imperfections, where data generation with a RTS is expected to be particularly beneficial.

Fig. 2 .
Fig. 2. RTS setup: The transmit signal of the radar-under-test (RuT) is modulated and reflected by four target elements.

Fig. 3 .
Fig.3.Illustration of the approach: By combining real environments with gesture simulations, the RTS creates virtually augmented drives.

Fig. 4 .Fig. 5 .
Fig. 4. Accuracy on the measured test dataset versus mean sample range.The bar plot shows the distribution of the virtually augmented samples vs. the sample minimum range R min .