Simultaneous Localization and Mapping (SLAM) for Synthetic Aperture Radar (SAR) Processing in the Field of Autonomous Driving

Autonomous driving technology has made remarkable progress in recent years, revolutionizing transportation systems and paving the way for safer and more efficient journeys. One of the critical challenges in developing fully autonomous vehicles is accurate perception of the surrounding environment. Radar sensor networks provide a capability for robust environmental detection. It become apparent that the principle of a synthetic aperture radar (SAR) can be employed not only in the field of earth observation but also increasingly in the field of autonomous driving. With the help of radar sensors mounted on vehicles, huge synthetic apertures can be created and thus a high angular resolution is achieved, which ultimately allows detailed images to be obtained. Increasing image quality, however, also increases the demands on position accuracy and thus the localization of the vehicle in the map. Since relative localization accuracies in the millimeter range over long trajectories cannot be achieved with conventional Global Navigation Satellite Systems (GNSS) so-called simultaneous localization and mapping (SLAM) algorithms are often employed. This paper presents a purely radar-based SLAM algorithm, which allows high-resolution SAR processing in the automotive frequency domain of 77 GHz. The presented algorithm is evaluated by measurements for trajectories with a length of up to 500 m and a measurement duration of more than two minutes.

Abstract-Autonomous driving technology has made remarkable progress in recent years, revolutionizing transportation systems and paving the way for safer and more efficient journeys.One of the critical challenges in developing fully autonomous vehicles is accurate perception of the surrounding environment.Radar sensor networks provide a capability for robust environmental detection.It become apparent that the principle of a synthetic aperture radar (SAR) can be employed not only in the field of earth observation but also increasingly in the field of autonomous driving.With the help of radar sensors mounted on vehicles, huge synthetic apertures can be created and thus a high angular resolution is achieved, which ultimately allows detailed images to be obtained.Increasing image quality, however, also increases the demands on position accuracy and thus the localization of the vehicle in the map.Since relative localization accuracies in the millimeter range over long trajectories cannot be achieved with conventional Global Navigation Satellite Systems (GNSS) so-called simultaneous localization and mapping (SLAM) algorithms are often employed.This paper presents a purely radar-based SLAM algorithm, which allows high-resolution SAR processing in the automotive frequency domain of 77 GHz.The presented algorithm is evaluated by measurements for trajectories with a length of up to 500 m and a measurement duration of more than two minutes.
Index Terms-Chirp-sequence radar sensors, radar imaging sensors, radar sensor networks, synthetic aperture radar, SAR, simultaneous localization and mapping, SLAM.

I. INTRODUCTION
T HE field of autonomous driving is advancing rapidly, thus putting increasing demands on environment recognition and sensor systems.A variety of sensor systems can be used for this purpose, each with its own strengths and weaknesses.LIDAR systems and cameras offer high angular resolution [1], but they lack robustness in adverse weather conditions [2].Radar systems, on the other hand, have lower angular resolution but are more robust to weather conditions.In recent years, the use of radar sensor networks has offered significant advantages in terms of detection capability and accuracy.References [3], [4], and [5], allowing these sensors to be utilized for mapping as well.
The target information obtained from radar raw data can be used to create environmental maps, e.g.grid maps [6].These offer the advantage of a cell-based representation of the environment, which makes them efficient to process.Grid maps can be generated either from target lists (T-GM) or from raw data (R-GM).Target list-based mapping approaches can be divided into amplitude grid maps and probabilistic occupancy grid maps.Probabilistic occupancy grid maps have been shown to provide a more robust representation of the environment with significantly increased resolution.
One way to represent the environment with a much higher resolution without changing the hardware is to use synthetic aperture radar (SAR).SAR processing of the raw data from a mm-wave radar sensor mounted on a moving vehicle can create a larger aperture, which results in increased angular accuracy compared to target list-based mapping approaches [7], [8], [9].Since SAR processing also represents the environment in a cell-based manner, it is called a raw databased grid map (R-GM).SAR maps provide a much more accurate representation of the environment than target list-based grid maps due to the huge aperture spanned [10], [11].Target list-based grid maps, on the other hand, are very computationally efficient and can be implemented in real time without much hardware overhead.
Although T-GMs and R-GMs represent the map differently, their processing has one major common featurethe position of the radar sensors and the vehicle must be known at each measurement interval.While the position of the radar sensors at each frame must be known for T-GMs, the position of the radar sensors at each transmit interval must be known for SAR processing.T-GMs have a lower localization requirement due to their lower maximum map resolution, which results in a position accuracy of 3 cm being sufficient to obtain an artifact-free image of the environment.Nowadays, such position accuracy in the lower centimeter range can be achieved with a differential Global Navigation Satellite Systems (GNSS) [12].In SAR processing, however, the range-compressed data must be phase-corrected according to the position of the radar sensor.Therefore, in theory, a relative accuracy of less than λ /4 should be maintained.For radar sensors typically used in automotive applications with a frequency of 77 GHz, the maximum localization error corresponds to 1.8 mm.This accuracy cannot be achieved nowadays with ordinary GNSS.Furthermore, dense cloud conditions, high buildings or driving in tunnels or underground garages impede satellite-based localization and the localization accuracy is significantly degraded or fails completely.Internal measurement systems such as an internal measurement unit (IMU) or a wheel speed sensor have the advantage of being able to measure the speed of the vehicle independently of the external environment.However, their suitability for long-term localization is limited.The speeds are integrated according to dead reckoning and thus a small speed error leads to a large position error for long trajectories.Another problem of all systems is on the one hand the expensive price and on the other hand the time synchronization between the radar sensors and the corresponding localization systems, which must be in the range of a few microseconds for SAR in order to guarantee a coherent SAR processing.
The problem of insufficient position accuracy for SAR processing in the frequency range of 77 GHz is well known and is being attempted to be solved with the aid of various algorithms.At the early stage of the research the radar sensor was placed at thousands of different positions for this purpose and a measurement was performed at each position to span a synthetic aperture [13].Following on these results, the radar sensor was mounted on a moving vehicle to generate a synthetic aperture while driving.In this process, the trajectory length could be extended stepwise from a few meters [9], [14], [15] up to 75 m to generate high-resolution SAR images [16], [17], [18].Although ground truth information was a prerequisite, the velocity errors had to be compensated with different autofocusing methods to obtain a sharp image of the environment.As a consequence, it could be shown that an ego-motion based dead reckoning exclusively from the radar sensors used allows a trajectory estimation on the basis of which a SAR processing is possible [8], [19], [20].However, all these methods allow SAR processing only for rectilinear trajectories and limited coherence.
Therefore, the paper describes an accurate position estimation algorithm based solely on radar data, allowing SAR processing over long and curved trajectories.This enables localization completely independent of external systems such as GNSS or an IMU.For this purpose, different algorithms like ego-motion-based dead-reckoning and scan-matching methods are fused using a graph-based SLAM algorithm.Due to the long trajectory and the high requirements on accuracy, cross-frame constraints are added to the graph.Thus, landmarks from different spatial directions are fused with each other.To avoid increasing the computational complexity of the already compute-intensive SLAM further, the SLAM is processed in combination with T-GM.Furthermore, another advantage of applying scan-matching to T-GM rather than directly to SAR imaging is the decoupling of the image quality from the ego-motion estimation.Once the trajectory is estimated, the trajectory is used for the significantly higher resolved SAR processing.
The paper is organized as follows: The sensor setup and the radar architecture are presented in Section II.In Section III the basic structure of the signal processing chain is described.In Section IV the conventions of SAR processing and Fig. 1.Top: chirp-sequence modulation scheme for N K ramps and two frames (1: orange, 2: blue).Bottom: sensor position for each transmitted ramp spanning a synthetic aperture [18].
probabilistic T-GM are derived.Subsequently, possibilities for radar-based ego-motion estimation and corresponding dead reckoning are derived in Section V part V-A, which are fused with presented scan-matching algorithms from Section V part V-B to set up the graph-based SLAM described in Section V part V-C.The presented algorithms are subsequently evaluated qualitatively and quantitatively in Section VI for different measurement runs on a parking lot with a length of several hundred meters.

II. CONCEPT AND SYSTEM ARCHITECTURE
The sensor configuration consists of incoherently-networked chirp-sequence radar sensors mounted on a non-stationary vehicle.The chirp-sequence radar sensors have multiple transmit and receive antennas enabling multiple input multiple output (MIMO) operation.The radar sensors emit frequency modulated continuous wave (FMCW) ramps.These are characterized by the bandwidth f B , the up-chirp time T c , the ramp repetition time T r and the start frequency f 0 [21], [22].The radar sensors are frame-based and emit N k ramps within each of the N f frames.This notation and the modulation scheme are shown in Fig. 1.
For comprehensibility, both the amplitude in (1) and the counting index n k are neglected.κ describes the slope of the frequency ramp which results from the ratio between the bandwidth f B and the up-chirp time T c .φ 0 describes the start phase of the emitted chirp.This generated transmit signal is emitted with the help of the transmit antenna, reflected at the target and then received again with the help of the receive antenna.For this, the signal in air needs a transit time of τ = 2r /c 0 , where r describes the distance between the sensor and the target.Thus, the received signal corresponds to the time-delayed transmitted signal and is represented by Due to the radar architecture, the transmit signal and the receive signal are mixed analogously and lowpass filtered, Fig. 2. Radar network with N r = 6 radar sensors in violet with different orientations.Vehicle coordinate system P c = x c , y c in blue, dashed and corresponding velocity components in orange, solid lines [18], [25].
resulting in the intermediate frequency (IF) signal For each of the N k emitted ramps and N f frames, this IF signal is subsequently processed as described in Section III.
The radar sensor network consists of any number of sensors and is sketched for N r = 6 sensors in Fig. 2. The vehicle coordinate system x c , y c is shown in blue.The velocity vectors for all degrees of freedom (DoF) is shown in orange.These DoFs include the vehicle velocity in the local x and the local y directions, as well as the yaw rate ω acting at the center of the rear axle.

III. PROCESSING CHAIN
Based solely on the raw data s IF of all radar sensors, a highresolution SAR image is processed.The algorithm used for this is shown in Fig. 3 for N r sensors, and N f frames.For clarity, each frame consists in the sketch of only N k = 3 ramps as opposed to N k = 384 as in the later evaluated measurement.
In the first step, the signals s IF of all radar sensors are digitized and transformed independently into the frequency domain to obtain the range-compressed fast-time samples ( ).Based on these fast-time samples, SAR processing is performed afterwards ( ) as described in Section IV-A.The problem in this case is the localization and therefore the knowledge about the trajectory of the vehicle.To determine this trajectory a SLAM algorithm is applied ( ).A graph-based SLAM is employed in this case, which determines the most likely trajectory based on the fusion of ego-motion estimation and kaze feature-based scan matching.However, this SLAM does not optimize the SAR-processed environment but a robust and significantly more efficient occupancy target-list based grid map as described in Section IV-B.To create such a target list based occupancy grid map the target lists have to be determined independently for each sensor ( ).For this, common signal processing techniques (windowing, slow-time Fourier transform, constant false alarm rate (CFAR), and angle of arrival (AoA) estimation) are applied to the IF data of all N k emitted ramps of a frame.Thus, with each of the N r chirp-sequence radar sensors used, the azimuth angle φ s n r ,n t , the range r s n r ,n t , and the radial velocity v r n r ,n t in the local sensor coordinate system ( s ) is determined for each n t -th target, where N t describes the number of targets detected.
Using the occupancy grid map (OGM) for the SLAM algorithm enables velocity-independent environment mapping for one frame, reducing the complexity of the computationally expensive algorithm.

IV. MAPPING
In general, for this purpose, the environment M to be mapped is gridded into N i cells.Subsequently, for each cell m i , either the occupancy probability is computed based on OGM processing or the received power is computed based on SAR processing.Thus, either an efficient OGM or a high-resolution SAR image can be processed based on the same radar data.Due to the efficient processing, the SLAM from Section V-C is processed based on the OGM.The subsequent high-resolution environment mapping is created using SAR processing based on the SLAM estimated trajectory.

A. Synthetic Aperture Radar (SAR)
In general, there are many different methods to generate SAR images of the environment.Basically, a distinction is made between algorithms in the frequency domain and algorithms in the time domain [26], [27], [28].The advantage of the time-domain algorithms lies in their universal application range since the trajectory does not have to meet any special requirement like linearity.Therefore, the backprojection (BP) algorithm, which is operating in the time domain, is applied in the following.
For each cell m i of the environment to be imaged M, the measured range information is phase corrected according to the range to be evaluated.This corresponds to a matched-filterbased phase correction.If the cell to be evaluated m i represents a target, the signals of all N k ramps overlap constructively after phase correction, whereas these pointers overlap randomly if the cell represents noise.
According to Fig. 3, range compression is first performed based on the fast time samples S n f ,n k IF , which describe the sampled IF signal s IF (t) from (3).For clarity, the time t k is used in the following to describe the time of an emitted ramp.In contrast, t f describes the time of a frame, which is relevant for the calculation of grid maps.This results in the following assignment: The range-compressed data corresponds to the transformation of the fast-time samples into the frequency domain and is described with a Fourier transformation where A t k is the range information vector at time t k with N s corresponds to the number of samples of a frequency ramp.
As long as no zero-padding is performed and IQ sampling is applied, this also corresponds to the length of the frequency vector.Each entry of the vector A t k corresponds to a complex value, which is assigned to a certain frequency and thus to a certain distance r.
Based on the range information r and the assigned complex amplitudes A t k a SAR processing with the BP algorithm is performed for the environment M to be imaged.To each of the cells a received power G(m i ) is assigned.This complex value is iteratively calculated based on all measurements ( ) and all position data (χ) using the backprojection algorithm [29], [30], [31].
The received power of the cell G(m i ) at measurement time t k is composed of the cell's received power at time t k −1 and the phase-corrected measured value of the current measurement describes a distance-dependent phase correction to ensure constructive superposition for targets.The assignment of the correct frequency bin of the current measurement A t k to the cell m i is described with where A t k u m i | t k , χ t k describes the u-th entry of the vector A t k .The position index u for the vector A t k is given by the following ratio rounded to the nearest integer.
In this case d m i describes the distance between the n r -th radar sensor at the global ( g ) position P g = [P g x , P g y ] and the cell m i .The final SAR image is processed exclusively in the x y-plane at the height of the radar sensors, given that the radar sensors, along with the vehicle's motion, cover a synthetic aperture limited to the x y-plane.
According to (8), each cell of the map to be imaged is assigned a distance and thus also a complex-value received power according to (6).This complex pointer must be rotated around the phase according to the backprojection algorithm (7) to obtain a constructive or destructive overlay.
These steps are performed for each of the N i cells of the rasterized map, as well as for all N k emitted ramps of each emitted frame.The movement of the vehicle leads to a continuously changing sensor position which results in a large synthetic aperture.As soon as a cell represents a target, all complex-valued received powers constructively overlap, whereas the received powers overlap randomly if the respective cell does not represent a target.

B. Occupancy Grid Mapping
The creation of target list-based grid maps is fundamentally different from the creation of SAR mappings.While SAR processing is a ramp-based processing, OGM processing is a frame-based processing.According to Fig. 3, all ramps of a frame are taken to obtain a target list.Subsequently, the targets are added to the map M according to their probabilistic properties.
Each of the N i cells of the map M represents a probability in OGM and not an integrated received power as it is the case in SAR processing.The processing of the OGMs is fundamentally divided into three steps: • modeling of free-space • calculating target probability • updating probabilistic map These steps are described below with reference to [32], [33], and [34].The goal in this case is to determine the occupancy probability p m i | 1:t f , χ 1:t f of cell m i at time t f based on all measurements 1:t f and all pose information χ 1:t f up to the current time t f .
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
The first step of probabilistic OGM processing involves modeling the free space.This free-space model describes the occupancy probability p FS of each cell if no target is detected with the CFAR at that position and is both range and angle dependent.While the free space probability is modeled proportional to the distance squared, it is approximated with a cosine in the azimuth plane.
The parameter c 1 describes the minimum probability of the free space model and has to be chosen according to the radar sensor and the applied CFAR and angle estimation algorithm.
For the radar sensors, c 1 = 0.3 is used to avoid suppressing missing detections due to clustered CFAR detection.The variables R max and FoV φ describe the maximum measurable distance of the radar sensor and the beam angle of the utilized antenna.According to the properties of the free-space model, this probability p FS m i |x 1:t F must generally lie in the interval [0; 0.5] and must be normalized accordingly.The variable φ m i describes the azimuth angle in sensor coordinates between the sensor and the cell m i to be evaluated.The free-space model is shown in Fig. 4a for a sensor at position P g = [0 m 0 m].Next, for each of the N t detected targets, the target probability p T must be determined.This is modeled based on the measured distance r s n r ,n t , the azimuth angle φ s n r ,n t , and the SNR.In the following, the indices are neglected for clarity.
The variable h is taken to vary the slope of the function p SNR which adapts the function to the radar sensor employed.The target probability p T describes the arithmetic mean of all three sub-probabilities.The Gaussian distribution is used to model the probability of each target within the interval [0.5; 1], accounting for the uncertainties σ r , and σ ϕ associated with the radar sensor.
The matrix , represents the covariance matrix of the measurement , while the extracted target information is denoted as µ.The vector k specifies the cell that is being evaluated.
Consequently, the probability distribution of each detected target extends across multiple cells in the free-space map.The registration of each of the N t detected targets is performed, allowing the probability of a cell m i at the current measurement time t f to be characterized by If this procedure is performed for many measurements, an OGM can be calculated.For this, the probabilities of the current measurement p m i | t f , χ t f are combined with the probabilities of all previous measurement p m i | 1:t f −1 , χ 1:t f −1 and all sensors.
The symbol ⊚ describes the probabilistic update step corresponding to [35].

V. SELF-LOCALIZATION
The mapping algorithms described in Section IV enable an environment mapping once the global vehicle pose P g v = P v,x P v,y P v, is known.This vector consists of the x and y position of the vehicle and the orientation P v, .In general, however, this vehicle pose P g v is not known and must be determined which is done below using a graph-based SLAM method for an unknown environment M. The graph-based SLAM algorithm consists of three steps: • radar-based ego motion estimation • scan matching • solving the graph equation These steps are derived below to enable stand-alone radarbased self-localization.

A. Ego-Motion Estimation
In [36], [37], [38], [39], [40], and [41] it was shown that three velocity components v x and v y as well as the yaw rate ω (corresponding to Fig. 2) can be determined using a radar sensor network(N r ≥2) based on the target lists.For this purpose, the following model describing the relationship between the measured target information v r n r ,n t , φ s n r ,n t and the intrinsic velocity V p is used: Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
In this case, x c n r and y c n r denote the sensor positions and φ c n r ,n t denote the incidence angles relative to the vehicle coordinate system.These are calculated with As soon as more than two radar sensors are employed on the vehicle, the equations can be transformed into a common system equation, and thus the velocity vector V p can be uniquely determined.
Subsequently, ( 25) is transformed according to the searched velocity vector V p , for example, using a Moore-Penrose inverse of Q.
The velocity of the vehicle is calculated by considering stationary targets, as explained in (25).To accurately estimate the vehicle's speed, it is necessary for at least two sensors to detect a minimum of three targets, enabling velocity estimation for all three degrees of freedom (DoFs).As ( 25) applies exclusively to stationary targets, any non-stationary targets or false detections need to be eliminated.To filter out these outliers that do not conform to the expected motion model, a random sample consensus (RANSAC) algorithm [42] is employed.
1) RANSAC-Filtering: In order to ensure a dependable and resilient estimation of ego-motion, it is essential to eliminate outliers.One effective approach is to utilize an iterative RANSAC algorithm with a specified number of iteration steps, denoted as N it .During each iteration step n it ≤N it of the RANSAC algorithm, the velocity model V n it p is estimated by selecting three targets randomly from the target lists (TL 1 , . . ., TL n t ).Subsequently, the velocity difference D V between V n it p and all measured radial velocities V r is calculated Following that, the accuracy of the velocity estimation is evaluated based on the number of inliers.Inliers represent targets with velocity errors below a suitable threshold T v relative to the currently estimated motion model.It is crucial to select a threshold that encompasses the majority of real stationary targets, taking into account measurement inaccuracies and noise.To ensure this, a threshold of T v = 0.7 m s is chosen for the evaluation.This iterative RANSAC process is outlined as follows: where D n it V, j describes the j-th element ( j≤J , J = O(D n it V, j ) where O describes the cardinality and therefore the number of detections) of the vector D n it V with and 1 specifies the indicator function with: The model that provides the best fit to the current measurement, indicated by having the highest number of inliers, corresponds to the most probable velocity vector V K it p of all N it iteration steps.
As the RANSAC filtering relies on N it iteration steps and three randomly selected targets, V K it p represents the best estimation derived from three stationary targets rather than encompassing all stationary targets.In order to enhance the accuracy of velocity estimation, the vehicle's velocity is reestimated, this time considering all targets that conform to the model within the threshold tolerance T v for the velocity vector V K it p .This principle is shown schematically for two sensors in Fig. 5.The two sensors have an orientation of and are color coded.The measurements selected as inliers by the RANSAC algorithm are shown as red-filled scatters.Those detected targets which do not correspond to the most probable velocity model are filled transparent.Based on the selected inliers, the most likely vehicle speed is estimated, which is shown in green ( ). 2) Dead-Reckoning: Based on the estimated velocity vector Ṽ p , the vehicle pose P g v = P v,x P v,y P v, is determined using the dead-reckoning algorithm.In this case, the pose at time t+1 is determined based on the vehicle position, vehicle velocity, and vehicle orientation at time t: Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.The variable r v denotes the radius of the curve, which is given by The variable dt describes the time interval between two consecutive time steps and V p the angle of the estimated velocity.Since the processing of grid maps is frame-based, the time interval dt in this case corresponds to the reciprocal of the frame rate of the radar sensors, which in the later measurement is 37 frames per second (fps).Dead reckoning is a very simple way to estimate the a posteriori pose of a vehicle.However, since the pose estimation is based solely on velocity estimation, erroneous velocity estimations will be integrated and lead to erroneous vehicle poses for long trajectories.To minimize these errors, the pose is additionally determined by scan matching.
Dead reckoning is very well suited for short-time position estimation within a frame and thus for each transmitter ramp, which is important for SAR processing.Incorrect dead reckoning within a frame leads to a distorted image of the environment during SAR processing.However, since GMs process target lists, and are frame-based, the resulting map is independent of the ramp positions and thus independent of the dead reckoning.Since the dead reckoning and the scan matching should be as independent as possible for the later presented SLAM algorithm, this property is exploited and the scan matching is processed on the basis of the velocity independent OGMs.

B. Scan-Matching
Rather than using velocity information, scan matching estimates the pose change of the vehicle based on environment maps.The measurements from different times are compared with each other.The pose difference of these two measurements is subsequently converted into a transformation which determines a relative pose information between the two measurement times.This is sketched in Fig. 6.
The Fig. 6  ) is still at position (0, 0) in the local vehicle coordinate system, but the targets have shifted.Based on this target shift, the pose change of the vehicle is determined in this section.If this pose estimation is performed by scan-matching methods, the result is the lower sub-image, which is again shown in the vehicle coordinate system at measurement time t.It is evident that the two walls are congruently matched, and the vehicle ( ) has moved forward ( ) during one frame and has gone through a left turn.
The aim is to estimate the transformation as precisely and robustly as possible.Since the sensor environment according to Section IV-B can be mapped very efficiently with an occupancy grid map for one frame, methods from image registration are applied for scan matching [43], whereas these have to be adapted according to the sensing properties of radar sensors.
In general, image registration describes a method for determining the transformation between two images, in the sensing case, of two images of the environment [43].In this case, a basic distinction is made between the reference image (Fig. 6a) and the object image (Fig. 6b).According to Fig. 6c), the reference image is at a fixed position while the object image is shifted to the reference image as best as possible by means of a transformation.This is done with the help of automated feature detectors such as the robust kaze features [44], [45], [46].In this process, scan matching is performed in three basic steps, which will be discussed in more detail below: 1) feature detection 2) feature description 3) determination of correspondences 1) Feature Detection: The detection of suitable features is elementary for an accurate scan-matching-based pose estimation, since all post-filtering steps are based on the detected features.As the subsequent algorithms have their origin in image processing, the occupancy grid maps are converted to gray-scale images.Each cell m i in this case no longer describes the probability p(m i ) but the concentration C(m i ).
To ensure robust and reliable feature detection, the image C(m i ) is simplified in several stages and will be referred to as scale space W in the following.This reduces noise and makes prominent structures such as edges or corners more prominent.In the simplest case, this could be achieved by convolving the image with a linear (Gaussian) kernel function B G (m i ).
The parameter σ k describes the standard deviation of the Gaussian filter whereas the parameter s k describes a factor for the modification of the Gaussian kernel.This parameter will be discussed in more detail later.In general, the image C is smoothed more for higher values of s k , which results Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply. in more noise suppression.This is exemplarily shown for a measurement in Fig. 7a.The problem in this case, however, is that not only the noise of the image is smoothed by a convolution with a Gaussian kernel, but also all prominent structures, which are essential for a robust feature extraction.This is remedied by nonlinear kernel functions such as those used in kaze features [46].The goal of this nonlinear kernel function is to suppress noise, but not to smooth details and structures.For this purpose, the intensity of the smoothing is adapted to the local image structure.To achieve this, a diffusion description method is applied.
Fick's law describes in general the diffusion and thus the energy-less smoothing process of concentrations.The concentration corresponds to the gray values C(m i ) of the image of the environment.By applying the continuity equation, the diffusion equation to be solved is obtained The variable of time t s corresponds to a scaling factor, which has a direct effect on the diffusion.A higher value of t s leads to a stronger smoothing and thus a stronger simplification of the image.This value will be discussed in more detail later.
In order to achieve the desired nonlinear diffusion filtering, the conductivity function has to be determined as a function of the gradient, which can be done by the following equation The luminance function C σ represents the gradient of a Gaussian-filtered rendition, with standard deviation σ , of the initial image C. The function g must be chosen according to the image [47].Since the radar images according to Fig. 4 do not have clear edges and corners as it is the case for example with camera or LIDAR images, the function g is chosen accordingly to favor wide regions over smaller regions.
The factor k c is a contrast factor which influences the degree of diffusion.The diffusion (35) is subsequently solved numerically, since an analytical solution is not possible due to its complexity.One way to do this is by the linear-implicit method [48].In this case, ( 35) is discretized and solved iteratively for each iteration it The unit matrix is described by I and the discrete time step by τ .′ is a matrix that encodes the image conductivities for each of the two dimensions.This result of a nonlinear kernel function is shown in Fig. 7b.Especially in comparison to the linear kernel function as shown in Fig. 7a, the different weighting of the diffusion in the nonlinear case is visible.
Once prominent structures are detected, they are further highlighted and irrelevant structures are filtered out by Gaussian smoothing.
Based on the described diffusion (35), a nonlinear scale space (NSS) is spanned in the following.Based on this nonlinear scale space the kaze features are extracted.The nonlinear scale space corresponds to a two-dimensional matrix NSS with K o columns, which will be called octaves in the following, and K s rows, which will be called sub-levels in the following.Each matrix entry of the two-dimensional NSS contains a specific simplification of the image C σ n NSS .
The different simplifications are described by different standard deviations σ n NSS .
The parameter σ 0 is the base scale level.Based on this, the nonlinear scale space is constructed.Since the diffusion (35) is computed in time and not in discrete scale levels, σ n NSS these must be converted to a diffusion time t n NSS according to [46].
In the case of a Gaussian distributed scale space, convolving an image with a Gaussian kernel with standard deviation σ corresponds to image filtering for a specific time.
The advantage of the nonlinear scale space is the invariance of the detected features to noise and features, since the features are extracted into different simplifications and thus for different diffusions.Based on this NSS, the features are subsequently extracted.For this, a normalized Hessian matrix C σ n NSS ,H is computed for different image levels.
As the amplitude of the spatial derivative decreases with increasing uncertainty σ n NSS it must be normalized [49].The second order derivatives are approximated by a consecutive application of sharp filters [50].To allow a robust estimation, detector responses are performed for different scale levels σ n NSS .Furthermore, the maximum search is not only performed spatially, and thus in the two-dimensional image, but also in cross-level between the different scale levels σ n NSS .The search Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.for maxima is performed in all levels excluding the first level n NSS = 0 and the last level n NSS = K o • K s .The search for a local maxima is performed over a σ n NSS ×σ i large area on three levels (n NSS − 1, n NSS , n NSS + 1).This is shown schematically in Fig. 8.
For this purpose, the maximum value is determined in this cuboid based on all cells, excluding the cell under test (CUT), according to Fig. 8.As soon as this minimum value is smaller than the gray value of the CUT, this CUT describes a local maximum and thus a kaze feature.Once a kaze feature is detected, the position of this feature is localized to sub-pixel accuracy according to [51] to ensure more accurate pose estimation.Overall, this procedure extracts all K kf kaze features.
To further increase the robustness of feature detection, the grid map is not based on a single frame but on many frames.This allows structures to be mapped more clearly, which ensures that kaze features are detected more reliably.The fusion of several frames is based on dead reckoning.Therefore, the more frames are accumulated, the sharper the structures become, but the more dependent the scan matching is on dead reckoning.Due to the later fusion of dead reckoning and scan matching, both localization methods should be independent of each other.Chapter VI shows that an optimal value is around eight frames.
2) Feature Description and Matching: In order to form correspondences between the features found in each frame, they must be characterized.This is done by descriptors, which describe the environment of a feature.
The feature description is divided into two parts and is done sequentially for each of the K kf detected kaze features.First, the orientation of each feature is estimated to make it rotation invariant.In the second step, the actual descriptor is built, which describes the environment of the feature.Based on this descriptor, correspondences between other features are subsequently determined to estimate the transformation and thus the relative vehicle pose.
In the first step, the orientation estimation, a circular area with a radius of 6σ n NSS is defined around the extracted kaze feature k kf .In this area, the derivatives in the local x-direction Subsequently, these derivatives are weighted depending on their distance according to a Gaussian distribution with the standard deviation σ = 2σ n NSS .To find out the dominant  orientation k kf of the feature these local gradients are represented as blue crosses in a vector space according to Fig. 9b.
The dominant orientation is subsequently determined using a sliding window method.The sliding window has a size of π 3 and is highlighted as a gray area in Fig. 9b.The maximum of the sliding window method corresponds to the dominant orientation of the feature and is shown as a red arrow in Fig. 9b.
Following the orientation determination, the environment of the feature is described as robustly and efficiently as possible.For this purpose, the M-Surf method [45] is used, which is adapted to the NSS.In the first step, an area of size 20σ n NSS × 20σ n NSS is defined around the extracted feature.To make the subsequent feature description rotation invariant, this surface is rotated around the dominant orientation k kf of the feature under investigation as shown in Fig. 9b.
The 20σ n NSS ×20σ n NSS large area is subsequently divided into 16 identically sized squares.For all 16 squares, corresponding to a sliding window of size 9σ n NSS × 9σ n NSS , the derivatives in the x and the y directions are determined and weighted with a Gaussian distribution with a standard deviation of σ = 2.5σ n NSS with respect to the feature position.The derivatives are subsequently assigned to a descriptor vector ψ i .
This is performed for each of the 16 sub-regions and combined into a common vector , with each sub-descriptor normalized by a Gaussian distribution with standard deviation σ = 1.5σ n NSS .
Now a specific and rotation invariant descriptor has been determined for each feature.Therefore, it is possible to determine correspondences between the features of the reference image and features of the object image.For this purpose, the squared difference of the elements of the two descriptor vectors to be compared is summed up.As soon as the squared error is below a chosen threshold, the features are marked as correspondence.
To avoid multiple correspondences between different features, the correspondence with the smallest error is selected in a second iteration, which enables a robust transformation.
3) Pose Estimation: In the final step of image registration, a transformation is estimated from the detected features and determined correspondences.First, translation and rotation are determined via a procedure that minimizes the squared error between the found feature pairs from sub-Section V-B2 [52].The corresponding features of the object image are represented below as a two-dimensional point set {p} with p i = [x i , y i ] whereas the features of the reference image are represented as a two-dimensional point set {p ′ } with The corresponding feature positions are represented by the transformation where R sm describe a rotation matrix with dimensions 2×2 and T sm a 2×1 translation vector.The most probable matrices R sm and T sm are determined via a quadratic error minimization of all features found.
Rsm , Tsm = arg min This equation can be solved very efficiently via a singular value decomposition according to [52].Based on the rotation matrix, the relative vehicle pose P g v = P v,x P v,y P v, is determined.

C. Graph-Based SLAM
With both dead reckoning and scan matching, the trajectory of the vehicle can be determined solely on the basis of the radar data.However, both methods lead to erroneous pose estimation during longer measurement runs, which significantly degrades the quality of the resulting map as illustrated in Section VI.However, since there is no existing map in which the vehicle can perform a self-localization, a SLAM approach is chosen to determine the most probable pose of the vehicle and the most probable map.For this purpose, a graphbased SLAM is used in the following because it has significant robustness and speed advantages compared to FastSLAM or Kalman-SLAM algorithms [53].
In this case, the nodes of the graph represent the vehicle poses at different times, with the edges between the nodes representing the constraints between the poses.The nodes are shown as a triangle and the constraints as arrows in Fig. 11.These constraints can be obtained via different ways.First, constraints exist between two consecutive nodes based on dead reckoning from Section V part V-A and based on scan matching from Section V part V-B of consecutive measurements.These types of constraints are shown as blue arrows in Fig. 11.Furthermore, however, the features determined in Section V-B can also be used to establish constraints between non-consecutive nodes, where these are shown as orange arrows in Fig. 11.
Once the graph is created, which will be referred to as the front-end, it is solved using an optimization procedure, the back-end, to find a solution that minimizes the error between all constraints.
1) Front-End: The set of nodes N n and the corresponding nodes q n n are combined in a vector q = [q 1 , . . ., q N n ].The nodes q n n and q n ′ n are connected by the edge The initial constraints c n n ,n n +1 are taken from the dead reckoning and scan matching and are shown as blue arrows in Fig. 11.Furthermore, general constraints are also entered into the graph.These constraints describe the recognition of already visited locations and are called loop closures.For this purpose, the described scan matching from Section V-B is not only performed for consecutive frames but for all recorded frames.According to the estimation uncertainty, these constraints are subsequently also added to the graph, which is shown as orange arrows in Fig. 11.This constructed graph is optimized and solved with the back-end in the following.
2) Back-End: The optimization of the constructed graph is done by minimizing the squared error of all constraints.The error is defined as the difference between the measurement Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.c n n ,n ′ n and the expected measurement ĉn n ,n ′ n .This expected measurement can be determined from the current configuration of the graph and the nodes.Thus, the error between the expected and real measurement can be calculated with This error and the constraints are shown in Fig. 12.
As soon as the estimate c n n ,n ′ n has an unequally distributed uncertainty, (48) has to be adapted The variable n n ,n ′ n characterizes the uncertainty.Due to the high robustness and consistency of the ego-motion estimation, the uncertainty for the ego-motion-based dead reckoning is assumed to be constant.Based on the standard deviations for the velocity estimate [σ v x , σ v y , σ ω ] = [0.045m s , 0.045 m s , 0.56 • s ], the local pose uncertainties (x, y, ) according to the measurement data from chapter VI for dead reckoning (31) According to [36], this results in a significantly smaller measurement uncertainty than estimating the velocity with budget IMUs or GNSS.The uncertainty for the scan matching is calculated from the uncertainty of the determined kaze features.For this, the error corresponding to (46) for the estimate Rsm , Tsm is calculated with for each of the K kf kaze features (i≤K kf ).While the position error in xand y-direction follows directly, the rotation error has to be determined via a transformation in polar coordinates in relation to the vehicle coordinate system: The covariance is obtained by The error function of the entire graph is thus computed using the sum of all errors as a function of the nodes q.
The optimal and most probable node values and thus vehicle poses are obtained by minimizing the entire graph by This problem can be solved via common numerical methods such as the Gauss-Newton method, or a modification of it, the Levenberg-Marquardt method.In this work, the framework g 2 o corresponding to [54] is applied, which is based on the Levenberg-Marquardt method.In this case, the minimum of the error function is approximated iteratively via linearizations.

VI. MEASUREMENTS
The measurements to verify the theoretical derivations from the previous sections were performed in a parking lot according to Fig. 13.Driving on the parking lot allows a curved trajectory with a large loop (upper loop), which has a length of approximately 180 m, and a small loop (lower loop), which has a length of approximately 140 m.
The measurement setup consists of N r = 6 incoherently connected chirp-sequence radar sensors, which are exploited for the different signal distribution steps [55].The sensors are positioned around the vehicle at equal angular intervals of 45 • , ensuring a complete 360 • field of view, which is depicted in Fig. 14.In general, all six sensors are evaluated for ego-motion estimation and self-localization and only the two sensors S 2 and S 5 are employed for SAR processing.
The precise locations of the sensors relative to the vehicle's coordinate system were determined using a Trimble tachymeter.The orientation of the sensors was estimated by use of self-calibration algorithms [25], [56].To achieve time synchronization among the radar sensors, an external trigger is employed.By introducing a slight variation in the starting frequencies of the sensors, this synchronization enables interference suppression between them [57].Since all the Fig. 14.Experimental system setup with six chirp-sequence radar sensors mounted on a car according to the sketch from Fig. 2.

TABLE I
USED RADAR PARAMETERS FOR SLAM-BASED SAR PROCESSING sensors share identical radar parameters except for their start frequencies, the transmit ramps of each sensor consistently exhibit a frequency offset from one another.As a result, there is no overlap or interference in the IF band.The sensors are only synchronized in time to have accurate time stamps and to avoid interference.The sensors are incoherently networked, so no common oscillator reference is distributed and each sensor operates on its own oscillator.The radar parameters are listed in Table I.
The ground truth trajectory of the vehicle was determined using a Trimble total station with an accuracy of at least 2 mm to ensure an accurate ground truth trajectory and is shown in red in Fig. 15.
Based on this precise ground-truth trajectory, a SAR map can be processed according to Section IV part IV-A, which is shown in Fig. 16 based on both side-looking radar sensors S 2 and S 5 .
The Fig. 16 shows not only highly reflective objects such as vehicles or posts and thus free or occupied parking spaces, but also weakly reflective objects such as a wooden fence (distinctive targets are marked in Fig. 23a).It is clearly visible that this fence is also detectable in the area behind the vehicles, since the synthetic aperture processes a reflection under many different angles and thus a high integration gain exists.This high integration gain also leads to the fact that the asphalted road can be clearly distinguished from the graveled parking spaces.Thus, with the help of SAR imaging, not only a detection of free parking spaces is possible, but also an identification of parking areas.

A. Dead-Reckoning Versus Scan-Matching
As soon as a ego-motion estimation with infinite quality or a scan matching with infinite accuracy would be possible, an identical SAR mapping of the environment could be generated.However, this is not a given due to measurement Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.inaccuracies and deterministic errors.This becomes clear once the estimated trajectories based on these two localization methods are compared, which is shown in Fig. 15a for dead reckoning and Fig. 15b for scan matching.
The red line ( ) represents the ground truth trajectory whereas the blue lines ( , ) describe the ego-motion based dead reckoning and the turquoise lines the scan matching ( , ).The solid lines represent the trajectories based on a correctly parameterized ego-motion algorithm whereas the dashed lines represent the result based on an incorrectly configured ego-motion algorithm.An incorrect parameterization of the ego-motion algorithm from Section V part V-A is present as soon as the threshold is set to low with T v = 0.1 m s , whereas a correct parameterization is given for a threshold of T v = 0.7 m s resulting in all relevant targets being considered for ego-motion.By choosing the correct threshold, the average trajectory error over all frames is reduced from 2.66 m to 1.02 m.
A similar behavior can be observed for the scan matching algorithm in Fig. 15b.Although the algorithm in general does not depend on the ego-motion estimation, the scan matching becomes significantly worse as soon as an incorrect threshold T v is chosen for the ego-motion estimation, which is shown as ( ).The reason for this is mainly due to the fact that as described in Section V part V-B several frames are combined to get a higher target density to better detect the kaze features.The position of the vehicle for all sub-frames to be combined is determined based on ego-motion estimation and thus dead reckoning.The number of sub-frames to be evaluated has a significant influence on the quality of the scan matching which is shown in Fig.The coloring of the Fig. 17 is identical to that of Fig. 15.It is obvious that the number of sub frames is irrelevant for the ego-motion estimation and the subsequent dead reckoning, which is shown in blue ( , ).Pose estimation based on scan matching, on the other hand, depends significantly on the number of sub frames evaluated.As soon as only one frame is used, in general too few targets are detected, which means that no prominent structures can be detected in the grid maps and thus the average position error over the whole trajectory is between 6 m and 8 m.On the other hand, it can be clearly seen that the more sub frames based on ego-motion are linked together, the average error is significantly minimized and ultimately the accuracy of ego-motion based dead reckoning is improved.
The trade-off between best scan matching and independent estimation versus ego-motion based dead reckoning is about 8 evaluated sub frames, which are also used for scan matching in the following.In the end, however, it can be seen that with a correctly parameterized ego-motion estimation as well as with a correctly parameterized scan matching no ideal trajectory estimation is guaranteed.This can also be seen when a SAR map is created based on dead reckoning with correctly chosen threshold as shown in Fig. 18.
It is noticeable, that a reasonable picture of the surroundings can already be generated at first glance.Free parking spaces can already be identified.On closer inspection, however, it becomes apparent that the image of the environment is distorted in some places and that double structures exist even though they do not occur in reality.This applies to the wooden fences which are no longer horizontal but convex and concave as well as the vehicle contours which are heavily smeared.This is the result of the integration error of the dead reckoning, whereby each speed error leads to a larger and larger trajectory error.This error is minimized in the following by using the SLAM algorithm presented in Section V part V-C.

B. Graph-Based SLAM
The trajectory estimated via ego-motion estimation and dead reckoning is used as the base trajectory.From this, the first configuration of nodes and the edges connecting consecutive nodes are generated.The sub-maps generated for each node are compared for loop closure detection.If a transformation is estimated, a constraint is created between the two nodes.The result after optimizing the graph (Section V part V-C) is shown in Fig. 19.
Fig. 19a illustrates the results with correctly selected RANSAC threshold T v (solid lines) while Fig. 19b illustrates the results of an incorrectly selected threshold (dashed lines).The color coding corresponds to the previous color coding.Red corresponds to ground truth trajectory, blue corresponds to ego-motion based dead reckoning, and orange corresponds to graph-based SLAM pose estimation.The black lines denote the constraints based on the loop closures.
It can be clearly seen that the graph-based SLAM again significantly improves the trajectory estimation.With a well parameterized ego-motion estimation, shown in Fig. 19a, the average error can be reduced from 1.02 m to 0.59 m.The same phenomenon is evident for the evaluation with a badly parameterized ego-motion, which is shown in Fig. 19b.In this case, the average error is reduced from 2.66 m to 0.83 m.
Ultimately, however, it is noticeable in both estimates that the estimate in the range y = 21 m has errors of up to one meter, although the graph-based SLAM has already been applied (orange trajectory).Since this area was not traversed several times, no loop closures can be found in this area.Thus, there are no additional constraints over which the graph could be optimized.To solve this problem the scan matching from Section V part V-B is not only applied to consecutive frames but also to frames which have a larger time distance than t = 1 from each other.This results in redundancy, which can be added to the graph as constraints between the nodes to again significantly improve the estimation.This principle is illustrated in Fig. 20.Fig. 20 illustrates that each node q n n has an additional exemplary constraint to the node q n n +5 .The distance should be chosen in such a way that objects from different perspectives are detected, but remain small enough to still be able to detect identical objects.The result of this optimized graph is also shown in 19a and 19b as green trajectory.It is clearly visible the estimated trajectory and the ground truth trajectory are almost perfectly superimposed and thus an optimized estimation is possible.The average error is again reduced from 0.59 m to 0.28 m for the well parameterized ego-motion and from 0.83 m to 0.33 m for the badly parameterized egomotion.This illustrates not only a significantly increased precision of the localization algorithm but also a significantly increased robustness.Regardless of whether the ego-motion was parameterized well or badly, the result of the final pose estimation is only 17 % worse compared to the initial pose estimation error of more than 250 %.
The resulting map is shown in Fig. 21.Compared to the ground-truth based SAR map in Fig. 16, no obvious differences can be seen here.This is quantitatively illustrated by the three lanterns in the center of the image at y = 20 m by comparing the position of these lanterns with the positions from the ground-truth image corresponding to Fig. 16.The average position deviation is only 7 cm, making the map a true representation of the environment.

C. Evaluation of Longer Trajectories
The previous investigations were all performed only for a measurement trajectory with a constant measurement duration of 30 s.However, the described SLAM algorithm for the generation of high-resolution environment images is also applicable for significantly longer trajectories, which is illustrated in the following Fig. 22.The measurement duration is approximately 140 s with just under 500 m traveled during this time.Following the previous convention, the ground-truth trajectory is shown in red, the ego-motion based trajectory in blue, and the SLAM based trajectory in green.The average absolute localization error over all 4800 evaluated frames is 42 cm.This error is only 50 % larger despite a measurement run more than twice as long as in Section VI part VI-B.A visual comparison between ground-truth based SAR imaging and SLAM based SAR imaging is shown in Fig. 23.
It is evident that qualitatively the images look identical.No double structures or distorted wooden fences can be identified.Furthermore, the gravel of the parking bays is still clearly distinguishable from the road.The reasons for this are the small localization errors within small sub-apertures, affecting the SAR processing the most.For a synthetic aperture of 16 cm, which corresponds to one frame and thus 384 evaluated ramps, the standard deviation is 1.2 mm.For a synthetic aperture size of 96 cm, the standard deviation is 17 mm.This corresponds to a deviation of only four wavelengths for a synthetic aperture of almost one meter.This minimizes the influence of the localization error on the environmental mapping as illustrated in the following.
In a SLAM algorithm, not only is the accuracy of the trajectory estimation crucial important but also the accuracy of the map.For this purpose, the position of a total of nine defined targets (posts) were investigated.The comparison between ground-truth based SAR processing from Fig. 23a and SLAM-based SAR processing Fig. 23b shows that the average localization error is 16 cm, ensuring accurate mapping even over large areas.Another way to compare the quality of the image is to determine the signal-to-clutter ratio (SCR).For this, the SCR is determined based on the wooden fence (signal) and the gravel and thus an empty parking area (clutter) according to Fig. 23a.The SLAM-based SAR image shows a lower SCR by only 0.35 dB, so that weakly reflecting targets can still be clearly distinguished from the gravel, although the position of the vehicle is estimated exclusively on the basis of the radar data.

D. Evaluation of Optimal Number of Sensors
The SLAM measurement results from Section VI are based on all N r = 6 radar sensors installed.The extent to which the number of sensors is decisive for dead reckoning, scan matching and the subsequent graph-based SLAM is shown in Fig. 24.
The Fig. 24 illustrates the relationship between the number of sensors and the average logarithmic error of the trajectory estimation in meters based on different methods.Analogous to the measurement results, the blue curve indicates trajectory estimation based on dead reckoning, the turquoise curve indicates trajectory estimation based on scan matching, and the green curve illustrates trajectory estimation based on graphbased SLAM.It is clearly visible that on average the accuracy of the trajectory estimation can be significantly increased with increasing number of sensors.This is especially true for the trajectory estimation based on dead reckoning (blue).Already purely based on dead reckoning, with six sensors the error is reduced from almost 8 m to only 1 m.
A similar behavior is also evident for scan matching.Whereby the estimation with six sensors does not bring any advantage compared to the estimation with only four sensors.In contrast to ego-motion estimation, where the data is evaluated jointly but each target is processed independently, the detected targets smear in the joint sub-map as shown in Fig. 4.Moreover, since the position and orientation of the sensors in relation to the vehicle coordinate system can only be estimated, the target peaks in the sub-map being evaluated widen when multiple detections are made by different sensors.
The graph-based SLAM finally merges the ego-motion based dead reckoning and the scan-matching based trajectory estimation.This results in the green curve in Fig. 24, which illustrates a significant reduction in the estimation error for up to four sensors.The improvement from four to six sensors is correspondingly small.This is largely due to the scan-matching based loop closures, which according to the green curve do not improve with more sensors, since the targets do not produce significantly defined kaze features when the relative pose estimation of the radar sensors is incorrect.

E. Impact of Localization Inaccuracies on the SAR Image
According to chapter VI-C, the average position error for the long trajectory is 42 cm, which is more than 200 times larger than the λ /2 requirement for SAR applications.However, according to Fig. 23a it can be seen that this has almost no effect on the imaging quality and defocusing effects.This seeming apparent contradiction can be traced back to the following three factors: • limited detection range of the radar sensors • attenuation • frame-based error Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply. 1) Limited Detection Range of Radar Sensors: The average position error of 42 cm occurs over the entire trajectory.For the final evaluation, however, the relative error in short sections of the trajectory is relevant.The parking lot, with its length of 100 m and a width of 60 m, is so spatially extended that the radar sensors cannot cover it completely with one measurement.On the one hand because of occlusion effects and on the other hand because of the limited maximum range of the radar sensors of 25 m.The relative error of the vehicle positions of the trajectory that are farthest from each other and thus have a relative error of more than 42 cm to each other ((−15 m| − 15 m) and (30 m|20 m)) is consequently irrelevant for the SAR processing and only distorts the final image.
2) Attenuation: Another factor is the influence of the trajectory on the SNR of a target.The received power at any synthetic antenna position is significantly dependent on two factors (in the case of an ideal target).First, on the distance due to the r 4 dependence, and second, on the angle of arrival and thus the angle-dependent attenuation due to the antenna pattern [58].This leads to the fact very large synthetic apertures bring only a small performance advantage compared to much smaller synthetic apertures, since the additional accumulated energy of large apertures is negligible.This is shown in Fig. 25 for a simulation without position error.For this, a typical automotive scenario with a point target at position t = (0 m|5 m) is studied.In each case, the synthetic aperture is centered to the target.As an example, for an aperture with a length of 1 m this results in a trajectory from (−0.5 m|0 m)→(0.5 m|0 m).Illustration of different position errors (a) and their effect on SAR mapping (b).Green -ground truth trajectory, red -random positioning, blue -noise corresponding to self-localization.Fig. 25 illustrates the SNR loss of a short synthetic aperture compared to a long synthetic aperture.The loss is shown in relation to the maximum SNR and thus for the maximum aperture length.The SNR of the target at position t = (0 m|5 m) is 7.8 dB lower for an aperture length of 96 cm than for an aperture length of 25 m.For a synthetic aperture length of 5 m, the difference is only 1.7 dB, which is marked in red in Fig. 25.According to the normalized measurement results from Fig. 23 the image of the environment is not SNR limited but limited by ground clutter, which results from the asphalt and gravel.This makes even the SNR loss of 7.8 dB of the trajectory over 96 cm negligible.
3) Frame-Based Error: The 96 cm long trajectory corresponds to exactly 6 evaluated frames in the measurement.According to chapter VI-C, it could be shown from the measurements that the standard deviation of the position estimate for this trajectory length is 17 mm and thus 10 times higher than λ /2.Nevertheless, according to Fig. 23a, no visual differences occurs.This is because the position error is not completely random, which is shown in Fig. 26a.The position error for 6 frames á 384 ramps is shown for no position noise (green), random position noise (red) and a real position noise from the measurement (blue).
The positional error of the measurement exhibits a distinctly piecewise steady pattern.Each frame comprising 384 ramps relies solely on dead reckoning, causing a continuously evolving positional error attributed to an inaccurate estimation of ego-motion.Following each frame, encompassing 384 ramps, the vehicle's position undergoes estimation through The SAR processing results of these three trajectories are shown in Fig. 26b in cross range for a range of 5 m.It can be clearly seen that the ground truth trajectory (green) and the estimated trajectory (blue) lead to almost identical results, while the evaluation of the random trajectory position no longer allows SAR processing.The reasons for accurate SAR processing based on the estimated trajectory is the piecewise steadiness of the position error.A sharp SAR image of the environment can be generated for each of the six sub-apertures even though they have an error of 1.2 mm according to VI-C.Due to this ego-motion based error, the targets have small angular errors of less than one degree [17], [59].Ultimately, to enable coherent addition of the six sub-apertures it must be ensured that a pixel can be found that compensates for the phase errors due to the positional inaccuracies caused by its erroneous pixel position in the grid.It is apparent from Fig. 26b that such a position can be found adjacent to the green peak.However, since the height of the peak is identical, a coherent addition was possible and thus a precise angle estimate.
To show this in general, the result of a Monte-Carlo simulation with 50000 trials is presented below.For each trial, a target is positioned at a distance of 5 m from the synthetic aperture at a random angle.Subsequently, the ground-truth trajectory, a random noisy trajectory and a piecewise continuous trajectory corresponding to the measurement are simulated according to Fig. 26a with a standard deviation of 17 mm.The performance is evaluated using the following results in Table II.
According to the expectations, it is best if the trajectory would be known exactly and thus no position errors exist.For the simulation an average maximum peak of just under 30 dB and an integrated side lobe ratio (ISLR) of nearly −32 dB results.It should be noted that the ISLR is normalized to the width of the main lobes, which means that the average received power outside the main lobes is on average −32 dB lower.Due to a simulation without position error, there are no erroneous angles under which the targets are detected.In the Monte Carlo simulation with random position noise, however, a peak with 15 dB less power was detected, but on average its localization is wrong by about 25 • and therefore random, which makes reliable target localization impossible.Furthermore, it can be seen by the ISLR that the peak found is also minimally above the average noise and sidelobes.
For the position uncertainty based on the presented SLAM algorithm, very similar average values as for simulation without position noise are obtained despite the high uncertainty of almost 2 cm.The maximum peak power is on average only 2 dB lower, the average ISLR is only 5 dB higher which means that unwanted signal components are still clearly below the peak.This also explains the low angular error of the simulated target of less than 0.4 • .As soon as the required accuracy is maintained for each frame and the targets are located close to the radar sensor, cross-frame position errors only lead to a minimal degradation of the results, which are not visually identifiable.

VII. CONCLUSION
The algorithm described in this article enables for the first time SAR processing in the automotive domain of an unknown environment without having to rely on external localization systems.The algorithm enables self-sufficient self-localization based solely on radar sensors.This enables high-resolution mapping that is completely independent of GNSS, IMU or other sensors.It has been shown metrologically that the presented algorithm achieves an average absolute localization accuracy of 42 cm and an average relative localization accuracy within one frame of only 2.5 mm over a driving duration of more than 2 minutes and up to 500 m.Furthermore, it was shown that this application also benefits from cascadable radar sensor networks.A comparison between the ground truth based mapping and the SLAM based mapping reveals that the signal to clutter ratio is only 0.35 dB lower for the slam based mapping, resulting only in a minimal loss of dynamic range.With the help of a Monte-Carlo simulation it could be shown that under certain circumstances the position accuracy may be many times worse than the wavelength without leading to defocused images.The presented algorithm not only enables high-precision environment mapping with 77 GHz chirp-sequence radar sensors, but also precise self-localization in unknown environments based solely on these sensors, allowing flexible use in arbitrary automotvie applications.

Fig. 4 .
Fig. 4. Representation of the free space model only (4a) and the free space model + target model (4b) for an exemplary measurement.

Fig. 5 .
Fig. 5. Representation of the ego-motion estimation based on detected targets of a simulated straight line drive for two sensors with the orientation ϕ c = ϕ c 1 ϕ c 2 = [0 • 90 • ] and the corresponding inliers.

Fig. 6 .
Fig. 6.Illustration of the scan matching principle.a) measurement at time t, b) measurement at time t+1, and c) estimated transformation of the vehicle pose based on the reference image and the object image.

Fig. 7 .
Fig. 7. Illustration of the convolution with a linear kernel function (a) and a non-linear kernel function (b).

Fig. 8 .
Fig. 8.Search for local maxima in multidimensional space for feature detection.

∂∂
x C σ n NSS and in the local y-direction ∂ ∂ y C σ n NSS are formed to determine the orientation of the derivative as shown in Fig. 9a.

Fig. 9 .
Fig. 9. Process of feature description.a) describes the local derivations in the xand y-direction.b) describes the process of estimating the dominant orientation of the corresponding feature.

Fig. 11 .
Fig. 11.Sketch of a graph with multiple nodes, consecutive constraints (blue) and constraints used for loop closures (orange).

Fig. 12 .
Fig.12.Illustration of two nodes q nn and q n ′ n and the corresponding estimated constraint c nn,n ′ n and measured constraint ĉnn,n ′ n .

Fig. 13 .
Fig. 13.Bird's eye view of the measurement environment with Google Earth ©2023, Kartendaten ©2023 GeoBasis-DE/BKG (©2009).Satellite image does not reflect real vehicle positions due to different recording times.

Fig. 15 .
Fig. 15.Trajectory estimation based on dead reckoning (a), ( , ) and based on scan matching (b), ( , ) for a correctly chosen threshold T v (solid) and an incorrectly chosen threshold T v (dashed).The ground-truth trajectory is shown in red ().

Fig. 16 .
Fig. 16.SAR-based representation of the environment on the basis of the ground truth trajectory.

Fig. 17 .
Fig. 17.Illustration of the influence of the number of sub frames on ego-motion based dead reckoning ( , ) and scan matching based trajectory estimation ( , ).

Fig. 18 .
Fig. 18.SAR-based representation of the environment on the basis of dead-reckoning.Enlargement of the double structure of the wooden fence.

Fig. 20 .
Fig. 20.Illustration of stepped scan matching for well-defined constraints without loop closures.

Fig. 21 .
Fig. 21.SAR-based representation of the environment on the basis of the graph SLAM-based trajectory.

Fig.
Fig. Trajectory estimation based on dead reckoning ( ), and optimized graph-based SLAM ( ) in comparison to ground truth trajectory ( ).

Fig. 23 .
Fig. 23.SAR-based representation of the environment on the basis of the ground truth trajectory, depicted in a), and based on the optimized SLAM, depicted in b).

Fig. 24 .Fig. 25 .
Fig. 24.Influence of the size of the radar sensor network on the accuracy of trajectory estimation for dead-reckoning ( ), scan-matching ( ), and the graph-based SLAM ().

Fig. 26 .
Fig. 26.Illustration of different position errors (a) and their effect on SAR mapping (b).Green -ground truth trajectory, red -random positioning, blue -noise corresponding to self-localization.
Simultaneous Localization and Mapping (SLAM) for Synthetic Aperture Radar (SAR) Processing in the Field of Autonomous Driving Timo Grebner , Graduate Student Member, IEEE, Ron Riekenbrauck , Graduate Student Member, IEEE, and Christian Waldschmidt , Fellow, IEEE The vehicle is surrounded by a possible target, a wall, on the front side and on the left side.As soon as the vehicle is in motion and a second measurement is performed, at time t + 1, a different relative environmental image results as shown in the upper right sub-image.The vehicle ( consists of three sub-figures.The upper left figure shows the environment of a vehicle ( ) in relation to the vehicle at time t.

TABLE II MONTE
CARLO SIMULATION TO EVALUATE THE SAR PERFORMANCE BASED ON TRAJECTORIES WITH DIFFERENT TYPES OF NOISE scan matching, cross-frame dead reckoning, and graph-based SLAM, resulting in discontinuities in the estimate.The high accuracy of dead reckoning, coupled with cross-frame constraints, yields a trajectory that is piecewise continuous, marked by minor discontinuities between consecutive frames.