R3O: Robust Radon Radar Odometry

Being able to measure and track positions of mobile systems is an important capability in many applications, autonomous driving being one major example. With the Robust Radon Radar Odometry algorithm, this article presents an approach for estimating the odometry of vehicles based on radar data only. The algorithm embodies a robust method for estimating the change in orientation as a key feature. The odometry algorithm is under the realm of direct methods, and it exploits properties of the Fourier transform for decoupling the changes in orientation from the changes in translation. In the first step, the Radon transform along with phase-correlation, outlier removal, robust measure of central tendency, keyframe selection and graph optimization are used in order to achieve a robust method for estimating the change in orientation, next the translation is estimated with the support of phase-correlation. The algorithm's performance was evaluated with real world data. Significant improvements in position and orientation error in terms of relative pose error and the KITTI odometry error metric are shown as compared to other direct methods for radar based odometry.

is among the many capabilities that are required for vehicles to achieve highly automated driving (HAD), i.e. level 4 and 5 of autonomy.The usual choice of sensors in selflocalizing vehicles is through combining information from Global Navigation Satellite Systems (GNSS) with Inertial Measurement Units (IMU) [2].However, GNSS-based estimation is impaired in GNSS denied regions and under multi-path conditions.Also, IMU-based odometry estimation has its pitfalls.For example, drift error, which affects any odometry algorithm, is aggravated with IMU due to the double integration required to calculate the position from acceleration data [3].
The aforementioned requirements with regards to a vehicle's autonomy as well as possible drawbacks with the common methods for self-localization generate the motivation to analyze other possibilities for self-localization of vehicles.One alternative is to consider radar sensors.Radar is a pivotal sensor in the development of HAD due to, e.g., its high robustness under many conditions and its long range capabilities [4].
Radar ego-motion estimation is currently a topic of intensive research [5], [6], [7], [8], [9], [10], [11], [12], [13], [14], [15], [16], [17], [18], [19], [20], [21].The approaches for calculating odometry using radar data can be roughly divided into direct and indirect odometry methods [6].The first category relies on estimating the change in orientation and position through using area-based methods, while the latter falls under the realm of feature-based methods.Direct and feature-based odometry methods have their own advantages and disadvantages.For example, one advantage of direct methods is that they do not need to perform the feature detection step; conversely, one advantage of feature-based methods is that they usually have lower computational complexity.
In contrast, direct radar odometry methods usually calculate the ego-motion by exploiting the decoupling between rotation and translation of the magnitude spectrum of an image.
In this article, we present an algorithm named Robust Radon Radar Odometry (R 3 O) which consists of a direct method for calculating the odometry of an agent, e.g. a vehicle, by exclusively using scanning radar data.R 3 O combines the advantages Fig. 1.Diagram of the R 3 O algorithm.By looking at the diagram is possible to observe the importance of a good estimation of rotation is not only on the estimation of that parameter itself, but also on the translation estimation.
of the direct radar odometry method with a procedure for calculating the rotation between radar image pairs which is more robust to interpolation errors, rotationally dependent aliasing and noise.The improvement in robustness is achieved by exploiting a property of the Radon transform along with filtering steps for removing outliers and less reliable estimations.The first step in R 3 O's rotation estimation consists of finding multiple candidates for the rotation via using the Radon domain images.Next, an outlier removal method is used for removing anomalous candidates.Then, a robust location estimator and a robust scale estimator are used for calculating the rotation and its uncertainty.In addition, a keyframe method is used which dismiss pairs with high uncertainty in their rotation estimations.Finally, a local optimization is employed in order to refine the orientation estimation.The benefits of having a robust method for estimating the rotation are at least twofold.First, clearly by improving the rotation estimation between pairs, the overall orientation estimation will tend to be more accurate.Second, due to the fact that, in general, in direct odometry methods one of the images is rotated using the estimated rotation before the translation estimation, a more reliable rotation estimation will likely lead to better performance in the translation estimation which induces a more reliable position estimation.
In particular, we judge that the innovative odometry pipeline (see Fig. 1) in R 3 O, the use of multiple candidates for rotation estimation by exploiting the Radar transform and the subsequent steps based on the statistics of this candidates including filtering the candidates as well as keyframe selection and graph optimization based on the uncertainty estimated from the candidates, represents an important development for direct based odometry algorithms.For example, rotation estimation and its uncertainty estimation exploring other statistical methods are some of the possibilities that our framework enables This article is organized as follows: in Section II the related work is discussed, followed by the theoretical foundations in Section III that are necessary for implementing the proposed algorithm.After R 3 O is explained in Section IV, its performance is evaluated in Section V.In Section VI an ablation study along with computational complexity evaluation is presented and a conclusion is provided in Section VII.

A. Direct Odometry
The idea behind direct methods can be traced at least back to [22].In that canonical article the goal was to calculate the translation, rotation and scaling between a pair of images.For that, the authors used the rotation and translation properties of the Fourier transform (FT) which leads to the conclusion that when comparing two images, where one is a reference image I r and the second is a translated, rotated and scaled replica of I r called I m , the magnitude of the FT of I m is invariant to translation and therefore, it is a rotated replica of the FT of I r .Following this, the two magnitude spectra are converted to their polar representation (when scaling estimation is not relevant) or to log-polar representation (when both scaling and rotation are desired to be estimated, which is in general not the case in odometry estimation); then, the polar or log-polar pair of images is phase-correlated and the resulting peak represents the estimated rotation, or the estimated rotation and scaling, respectively.Finally, one of the two Cartesian images embodies the rotation (and scaling) compensated using the estimations, and the translation between the pair of images is estimated by phase-correlating them, where the peak from the correlation represents the translation between the pair.
In [23] the authors used the aforementioned rationale for performing scan matching of sonar data.In that work, since scaling registration was not relevant, the author used the variant of the algorithm which converts the magnitude spectra to polar representation.Moreover, that research group expanded their algorithm for 3D mapping in [24].Furthermore, in [25] it is argued that it might be worth to use directly a polar image, i.e. range-azimuth image, for estimating the rotation between the two images.According to that work, when the translation is relatively small, the error caused by the coupling between translation and rotation in the spatial domain might be small Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
enough, so that the motion estimation can be done as follows: estimate the rotation between the pair by phase-correlating the image pair in the polar domain, next, convert the polar images to Cartesian representation.Finally, compensate the rotation on the Cartesian domain using the estimated rotation followed by phase-correlating the Cartesian images in order to estimate the translation between the pair.

B. Scanning Radar Odometry
In radar odometry we can classify the odometry approaches between feature-based and direct [26], [27].Further, we can also group the radar odometry approaches based on the type of radar that is used, i.e. single chip or scanning radar.Because the scope of this article is on scanning radar, on this section we will discuss the related work with this kind of radar.
Feature-based odometry approaches utilizing a scanning radar can be traced back to Cen et al. [5], [6].Cen et al. [5] algorithm consisted of finding features in each azimuth bin by finding the peaks in the 1D signal combining low-and high-frequency components of that signal and a scan-matching approach tailored to radar data was also included.Kung et al. [15] developed an algorithm based on the Normal Distribution Transform along with a filter for radar images which pixels below a threshold were not considered feature candidates.Moreover, Hong et al. in [13] used SURF features [28] for radar odometry and SLAM, and in [14] introduced an odometry and SLAM approach based on radar data which a blob feature detector was developed which is based on a Hessian matrix for finding keypoint candidates and these are further filtered by observing their homogeneous spatial distribution.Moreover, in [14] [29] for data association as well as implementing a motion-distortion correction.In addition, Burnett et al. proposed HERO in [30] using a deep neural-network, unsupervised learning and Exactly Sparse Gaussian Variation Inference for finding keypoints and estimating the relative pose.Another approach using neural networks and supervised learning was proposed by Barnes et al. [7].Furthermore, Aldera et al. proposed in [8] a mechanism to detect failures patterns in the feature matching step and in [9] a smooth-curvature constraint based on Ackermann drive model was exploited.Recently, Hyungtae et al. presented the ORORA algorithm which uses anisotropic component-wise translation estimation and graduated non-convexity based rotation estimation to estimate the rotation and translation separately.Finally, Adolfsson et al. [10], [11], [12] developed the CFEAR algorithm which is the current state of the art radar odometry algorithm.In [12] the latest versions of CFEAR, namely CFEAR-3 and CFEAR-3-s50, use the k-strongest filtering for pre-filtering the radar image as well as motion compensation and surface point distance minimization in the registration step.
The pivotal work in direct methods for odometry estimation using scanning radar data can be traced back to Checchin et al. [19], where the authors followed the procedure previously explained in [22] in order to estimate the position and orientation of the agent; the relative motion estimation was further used for performing Simultaneous Localization and Mapping (SLAM) using radar data.Moreover, Park et al. [31] proposed a coarse to fine strategy for estimating the motion.Also, a keyframe selection approach was employed as well as pose graph optimization, resulting in a trajectory with pose correction as well as a map of the traversed region.
Additionally, still within the scope of direct methods, Barnes et al. [21] deployed a supervised learning approach which employed a brute-force approach to find the pose between frames using correlative scan matching along with a filter used to reduce the effect of, e.g.noise and occlusion.Following the work from [21], Weston et al. [32] proposed a faster algorithm based on [21] by exploiting the Fourier-based registration.For further discussion in the recent progress in radar odometry, we refer the reader to [27], [33].
It can be noticed that most of the literature for radar odometry has concentrated its efforts in developing feature-based methods which can be justified because of, e.g.reduced computation requirements with a sparse number of points/features.Nonetheless, a sparse set of features does not use all the information available in the radar image (similar to what happens in the mapping domain when one has to choose between landmark and volumetric mapping).Moreover, its computational performance is proportional to the amount of features, leading to a trade-off between computational performance and algorithm robustness.On the other hand, in our article we show that exploiting the analytical characteristics of the direct approach, e.g. the decoupling between rotation and translation, along with the Radon transform and a robust framework for odometry estimation lead to results comparable with the state of the art radar odometry algorithm and better than all the previously documented direct radar odometry approaches.

III. THEORY
In this section the fundamental theoretical aspects used in order to develop R 3 O are explained.

A. Phase-Correlation 1) Introduction:
The Generalized Cross-Correlation is frequently used for estimating shifts between two 1D or 2D signals [34], [35].Among the numerous filters that can be used in a Generalized Cross-Correlation framework, the Phase-Transform filter is one of them.When such a filter is employed, the correlation method is named Phase-Correlation (PC).
Consider two signals x 1 (m) and x 2 (m) such that x 1 , x 2 ∈ R and m is the independent variable in spatial domain.Let x 2 be a shifted copy of x 1 , shifted by Δ m .By using the shift property of the FT, we have that the spectra of the two signals will be related by X 2 (u) = e −j2πuΔ m X 1 (u), where u is the independent variable in the frequency domain, and X 1 (u) and X 2 (u) are the spectra of x 1 (m) and x 2 (m), respectively.The Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
cross-spectrum is defined as where (•) * is the conjugate of (•).Finally, their phase-normalized cross-spectrum C(u) can be found by dividing the crossspectrum by its magnitude.Therefore, C(u) can be written as The next step for finding the shift consists of using the inverse Fourier transform (IFT) of C(u) which results in the phase-correlation vector denoted by Υ, or, by extending to two dimensions, the phase-correlation matrix denoted by Ξ.To conclude, the shift Δ m is found by locating the highest peak in Υ or Ξ.
Choosing the Phase-Transform filter for the Generalized Cross-Correlation, i.e., the inverse of the magnitude of the cross-spectrum as shown in (2), can be justified by several reasons.For example, the peak resulting from the IFT of ( 2) is theoretically a unit impulse function in comparison to the usually broader peak resulting from IFT of the non-filtered Generalized Cross-Correlation.Moreover, the PC gives more emphasis to the location, e.g.spatial structures, than energy [36].Besides, in [37], the PC is evaluated as a robust correlation method.
2) Enhancements to PC: Even though the PC may offer advantages when compared to other Generalized Cross-Correlation options, it is important to note that there are also disadvantages within this approach.For example, while PC has advantageous performance in the case of frequency-dependent noise, it has degraded performance with frequency-independent white noise [38].Nonetheless, by applying some signal processing techniques, the drawbacks can be mitigated.In [39] it is suggested that limiting the bandwidth of C(u) leads to accomplishing better results.The justification is that most of the artifacts that reduce the performance of the PC are present in the higher frequency components and thus employing a lowpass filter in the C(u) improves PC results.In this article, we will make use of an ideal lowpass filter designed in the frequency domain with a cutoff κ = 0.3 and κ = 0.6 for the PC used for rotation and translation estimation respectively, which implies that 70% and 40% of the higher-frequency components are rejected by the filter respectively.
3) Subpixel Estimation: In many applications, the estimation restricted to integer pixel positions using the PC demonstrated in Section III-A1 is not enough.As a consequence, a multitude of subpixel methods for the PC have been developed in literature.These methods diverge in many aspects.For example, the subpixel method might estimate the subpixel shift by using C(u) in the frequency domain or the Ξ in spatial domain.The reader may find an up to date review of subpixel methods in [38].In the present study, the authors have employed the subpixel method explained in [40] because of its relatively high performance reported in [38], [39].

B. Fourier-Based Registration
1) Overview: Fourier-based registration is an efficient registration approach which is under the family of area-based methods [38].The original algorithm once proposed in [22] consisted of registering images in which there was scaling, rotation and translation between the pair.However, for the purpose of this article, the scaling estimation is not relevant given that the radar images generated by a scanning radar can be seen as a top view 2D image around the vehicle where the height of the radar is fixed.Consequently, the theoretical foundations for Fourier-based registration will be explained only with regards to rotation and translation registration.
2) Polar Method: The polar method is the straightforward approach for calculating the rotation and translation between two images.Let the reference image represented in Cartesian coordinates be called I r (x, y).The moved image I m (x, y) evolves from I r (x, y) by applying a clockwise rotation by φ and a translation by (p, q).Hence, I m (x, y) can be written as Let then the FT of I r (x, y) be called F r (u, v) and the FT of I m (x, y) be called F m (u, v).By using the translation and rotation property of the FT, the two spectra can be associated by Therefore, in order to find the rotation between the reference and moving image, one can represent both magnitude spectra in polar coordinates, and finally the estimation of the rotation φ can be achieved by using the PC, where the peak in the angle direction of the Ξ will represent the rotation between the pair.Finally, one compensates the rotation by rotating one of the images using φ and the translation component can be found by using the PC between the images in Cartesian coordinates, where the peaks of this Ξ will represent the estimation of the translation: (p, q).
3) Radon Method: Even though Fourier-based registration using the polar (or log-polar) representation for estimation of rotation is a straightforward option, it might not be the one which will lead to the best estimation of φ.The problems mostly arise from the discrete nature of images and their respectively discrete magnitude spectra which have shown to be particularly detrimental to the performance of the straightforward option.For example, because values have to be interpolated onto different grids, larger interpolation errors are typically present especially in the Cartesian to log-polar conversion which has led to the design of filters in order to mitigate these effects [22], [38].Motivated by that, research has been made in order to look for alternatives to the polar representation option, and one that has shown positive results is to use the Radon transform for obtaining the rotation estimation [41].The intuition behind the improvement in performance when compared to the straightforward option, is that the Radon transform alleviates the interpolation errors because each "pixel" in the Radon domain comes from Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
a summation of pixels which represent an exact line in the magnitude spectra image while in the (log)-polar representation the conversion is done pixel to pixel [41] on interpolated grid positions.Effectively, this implies that elements affected by local interpolation errors have smaller influence because these local errors are amalgamated under the summation.
The Radon transform R(ρ, θ) of a continuous image f (x, y) is defined as with −∞ < ρ < ∞ and 0 ≤ θ < π and where δ is the impulse function.It should be noted that in the discrete case finite sums substitute the integrals, and ρ becomes finitely bounded according to the image size.By inspecting the definition of the Radon transform, the aforementioned property is clear: Every point defined by an angle θ and a displacement ρ in the Radon domain is constituted by a line integral (or summing along a line in the discrete case) over the original image.Moreover, a key property of the Radon transform is that if an image is rotated by φ, its Radon transform will be shifted by φ, leading to R(ρ, θ + φ) [42].By using this property, one can then convert the magnitude spectra of the pair of moving and reference image, earlier defined as I m (x, y) and I r (x, y), to the Radon domain, and then the rotation can be estimated by phase-correlating the two Radon domain images in which the peak in the θ dimension represents the estimated rotation φ.After that, analogous to the polar method, one can then compensate the rotation on the Cartesian images in the frequency or spatial domain and use the PC to estimate the translation.

C. Outlier Removal
Outliers are observations which differ from the normal behaviour of the rest of the observations.The Median Absolute Deviation (MAD) is a viable and often attractive option for identifying outliers.That might be explained by the simplicity and effectiveness of the algorithm.For a data set Y with elements Y m , m ∈ [1, . . ., M] the MAD of the data set, denoted by MAD(Y ), is defined as where M is the sample median, and according to [43], [44] the standard deviation of normally distributed data using the MAD can be estimated as Then, the following evaluation can be done to find outliers: where ζ is a threshold value, and Y m is an outlier when ( 8) is evaluated as true, i.e. the left hand side is greater than ζ.

D. Tukey-Biweight
The final component of this section consists of the Tukey biweight for location and scale estimation.This estimator is under a family of estimators called M-estimators.These estimators aim to attain efficiency, robustness and resistance [45].The tukey biweight location estimator, denoted by μ tukey , of a set Z with elements Z i , i ∈ [1, . . ., N] is defined as where M is the sample median and u i is given by: where c is a constant that must be chosen.In all our location estimations, we have set c equal to 6. Furthermore, the Tukeybiweight scale estimator, denoted by σ tukey , is defined as where u is defined in (10).We used c equal to 9 for all scale estimations.

IV. PROPOSED METHOD
A. Introduction R 3 O employs a careful method for estimating the rotation.First, this is inspirited by having in mind that usually radar sensors have lower angular resolution when compared to other sensors, e.g.lidar, which demands exceptional performance of the registration algorithm.Second, this is motivated by noting that the rotation estimation is a cardinal step for registering the translation in the Fourier-based registration.This can be considered true, because a poor rotation estimation does not only imply a bad estimation of that variable on its own, but also affects the estimation of the translation.
The proposed method takes advantage of the previously documented improvement in performance of rotation estimation by using the Radon domain instead of the (log)-polar representation [41] along with a sequence of steps that are performed for finding φ (see Fig. 1 for a diagram of how the relative motion is estimated by using R 3 O).Note that we will assume that the radar is placed looking forward and the radar Cartesian image has its center collocated with the radar's center.
The robustness of the algorithm can be primarily explained by the use of multiple candidates for the rotation estimation using Radon domain slices.The use of different displacements can be related to the work of [46] in the sense that the multiple displacements can be understood to be resemblant to different high pass filters, e.g., the higher the displacement the less low frequency components are going to be used, applied to the magnitude spectra, thus reducing the effects of rotationally dependent aliasing.This kind of aliasing is caused by artifacts that are introduced due to the discretization of the FT, and its effect can deteriorate the performance of the rotation estimation Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
using the Fourier-based image registration.For example, [46] has shown that this aliasing component reduces the height of the peak corresponding to the correct rotation.
Moreover, the use of outlier removal can be thought as a processing step which remove divergent estimations, e.g.caused by interpolation errors and noise.Furthermore, the Tukey-biweight location estimator, which is a robust measure of central tendency, is used for the estimation of rotation between two frames.This process is repeated over a window of frames which keyframe selection is performed followed by graph-based local orientation optimization which refines the rotation estimations.Finally, the optimized rotation between frames is used for de-rotating images in the Cartesian spatial coordinates and the relative translation is found by maximizing the phase-correlation between each pair.
Note that is possible to restrict the range of displacement values to choose in half due to the symmetry of the magnitude spectra of real images and one of property of the Radon transform, concretely that R(ρ, θ) = R(−ρ, θ ± 180 • ).Also, we note that it is possible to achieve similar results by removing the MAD outlier removal along with adjusting the constant values in the Tukey-biweight location and scale estimators.However, we decided to keep the outlier removal and location/scale estimation as different modules, so that other alternatives for location/scale estimation as well as other outlier removal approaches can be straightforwardly tested with the current algorithm's pipeline.

B. Preprocessing
Scanning radar images are contaminated by a series of unwanted artifacts, e.g.multipath, ghost targets and speckle noise, which may reduce the performance of odometry algorithms [5].Therefore, before performing the odometry estimation, it is sensible to try to reduce the influence of unwanted components.In our algorithm, we have included as a preprocessing step an adaptation of the k-strongest filter [12] tailored for direct based odometry.In this filter, for every azimuth bin on the radar polar image, the filter sets to zero all range bins that have a power return smallest than the k-th greatest power return in that azimuth bin, where k is a parameter.Moreover a threshold, denoted by z min , is applied to filter out, i.e. in our implementation this means set their pixel intensity to zero, all the pixels in the range-polar image that have power smaller than a threshold.In [12] it was shown that this filtering method achieves better performance with scanning radar than the constant false alarm (CFAR) filter.Nonetheless, we differ from [12] because we employ afterwards a direct method instead of a feature-based method.Consequently, we may choose a larger number for the parameter k as well as a smaller threshold for the smallest power return, i.e. we keep a larger number of pixels/targets.The advantage of that is that while in [12] it was necessary to select the parameters of the filtering algorithm leveraging a trade-off between robustness of the motion estimation against computational performance (it was shown in [12] that a less conservative filtering leads to high robustness while a more conservative filtering leads to higher computation performance), in our approach we can select the parameters solely based on the odometry performance.Finally, after filtering, the polar image is interpolated to Cartesian coordinates using bilinear interpolation.

C. R 3 O 1) Rotation Estimation:
For every pair of radar images which the relative motion is estimated, their relative-rotation and the uncertainty of the estimation are calculated.In order to explain the procedure for the rotation estimation, it is sensible to define the phase-correlation operator, denoted by P, as where F and F −1 are the FT and IFT, respectively, s 1 and s 2 are 1D or 2D signals for which P is being calculated, and α represents the arguments of the signals which can be either a tuple of 2 arguments or a single argument corresponding to 2D or 1D signals, respectively.Consequently, P's output will be either 2D or 1D depending on the dimension of the evaluated signals.
See Fig. 2 for an overview of how the rotation estimation estimation and its uncertainty is calculated (the methods described in Section III-B would consist of applying a 2D phasecorrelation after the transformation to Radon domain step or to use the polar transform replacing the Radon transform and them calculating the 2D phase-correlation).Having a moving and reference image in Cartesian coordinates, denoted by I m and I r respectively, the first step consists of calculating the FT of each image and the output of this step is the magnitude of each spectrum.Next, the magnitude spectra are converted to the Radon domain, let R r (ρ, θ) and R m (ρ, θ) be the Radon domain image of the reference image's magnitude spectrum and moving image's magnitude spectrum, respectively.Afterwards, instead of performing the 2D P(R r (ρ, θ), R m (ρ, θ)) for finding the estimated rotation, R 3 O calculates N 1D estimations of the rotation by selecting N slices of the Radon domain images.This is accomplished by noting that at a fixed ρ value the 2D PC becomes 1D.For example, when ρ is fixed to a selected displacement ρ k , the 1D PC is calculated by Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.where we use the subscript ρ k on each Radon domain image to indicate the slice with all the θ values of a selected ρ k displacement.Therefore, by defining a set W as the set with elements W k , k ∈ [1, . . ., N] as the initial set of candidates for rotation estimation, we have that every candidate is calculated by finding the phase-correlation between the reference and moving Radon domain images: where ρ k is the selected ρ value coming from a list of selected displacements which has length N .Fig. 3 shows an illustration of how a rotation in the Cartesian domain becomes a shift in the Radon domain.Succeeding that, an outlier removal step comes into place in which we have selected the outlier removal based on the MAD as explained in Section III-C for removing anomalous candidates.The surviving candidates are defined as the set X with elements Finally, the rotation between the frames is estimated by Furthermore, we estimate the uncertainty of the rotation estimation by calculating the Tukey-biweight scale estimation on the final set of candidates X: 2) Keyframe Selection: The keyframe selection step aims to dismiss radar images which generated high uncertainty in rotation estimation with regards to the current reference frame as well as to find the reference frame for the next iteration.Given the current reference frame, denoted by F 0 , the algorithm estimates the relative rotation and its uncertainty with regards to the frames F n where n ∈ [1, 2, . . ., L] with L being the length of the window.Each estimated rotation is denoted by φ0n and its uncertainty is denoted by σ0n where n ∈ [1, 2, . . ., L]. Next, in order to filter the estimations with high uncertainty, a fixed and an adaptive threshold are used.The fixed threshold, denoted by γ fixed , removes all the frames F n where the corresponding σ0n was greater than the threshold value.In addition to it, an adaptive threshold is used.This is calculated by γ apt = min(σ 0n ) • β apt (17) where β apt is a constant with β apt ≥ 1 and min(σ 0n ) is the smallest uncertainty in the current window.Analogous to the fixed threshold, the adaptive threshold removes all the frames F n where the σ0n was greater than γ apt .See Fig. 4 for a graphical representation of the keyframe selection step as well as for an example of how the relative rotations' candidates between F 0 and F n can be distributed after the MAD outlier removal.
In this example, frames 4 and 6 were removed because their uncertainties were greater than the γ apt .
The effect of the adaptive threshold is to avoid frames in the current window which had a considerable greater uncertainty than the smallest possible uncertainty in this window.On the other hand, the fixed threshold assures that no frame with uncertainty greater than what is determined to be acceptable can be taken (this could come into place in the case that the smallest estimated uncertainty is still high, then frames with unacceptably high uncertainty could be kept).In the case where all the frames' uncertainties are greater than the fixed threshold, the frame with the smallest uncertainty is the only one kept and selected.
Finally, the surviving frame which is the furthest in time from the current reference is selected as the reference frame for the next iteration (see example in Fig. 4).

3) Local Orientation Optimization:
We have used a graphbased optimization with a robust loss function in order to refine our local orientation estimation.For that we consider all the pair-wise combinations of the remaining frames after the keyframe selection step.The graph is formed by nodes which represents the orientation at frames F n relative to the reference frame F 0 .We denote the nodes as o i where i has the same subscript of the surviving Frame F n ; and the vector with all the nodes is defined as o = (o i ) T ∀i.Between each node, there is a rotation factor with its uncertainty.These are equal to the relative rotation estimation and its uncertainty estimated following the steps shown in Section IV-C1.The relative rotation constraint and its uncertainty are denoted by φij and σij respectively, where i is the subscript of the node where the edge starts and j is the subscript of the node where the edge ends.An example of the orientation graph is shown in Fig. 5.
The objective function of the local orientation graphoptimization is written as [47], [48], [49]: ) where V is the set of pair of indices for which a constraint in o exists, e is the error function which evaluates how much the parameters satisfy the constraints, Σ ij is the uncertainty of the measurement which is equal to σij , and ρ cauchy is the Cauchy loss function defined by Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.Fig. 4. Keyframe selection starts by estimating the rotation and the uncertainty between the reference frame and the frames in the current window.In this example, the uncertainties between F 0 and F 4,6 were greater than the defined threshold parameters.The reasoning for dismissing F 4,6 is elucidated by observing the boxplot (with the whiskers limits based on 1.5 times the interquartile range) where the larger dispersion on their estimations is clear.Therefore F 4,6 were not used for this local orientation estimation.Note that F 5 will be the reference frame in the next iteration.where c cauchy is a constant.Finally, it is evident that in order to get the relative rotation between each successive node requires only the calculation of the difference between their local orientation.For example, assuming an orientation vector o = (o 0 , o 1 , o 2 ) T , then the estimated rotation φ between nodes o 1 and o 2 is φ = o 2 − o 1 .In order to solve the graph, we use GTSAM 1 [49].
4) Global Orientation Estimation: Over the course, the orientation Φ n (with a given initial orientation Φ 0 ) of an agent with regards to an inertial frame at instance k, for k > 0, is then calculated by summing all the φ till n.

D. Position Estimation
After the local orientation is estimated, we use a sequence of the closest frames from the current reference frame to the 1 https://github.com/borglab/gtsamnext reference frame in order to calculate the translation in this window.For example, in the illustration shown in Fig. 4, the translation would be computed between the following pairs: . By using the appropriate φ for each pair, the rotation is then compensated in the Cartesian representation and the translation is estimated by using the operator P on the Cartesian images.However, in order to get the position, it is necessary to convert the displacement from the local coordinates into the displacement with regards to the inertial frame.We assume that p and q at instance k represent the displacement between the instance k and k − 1 in the vehicle's longitudinal and lateral directions, respectively.Therefore, the position (x k , y k ) for k > 0 and for a given x 0 and y 0 can be found by V. EVALUATION

A. Introduction
The goals of this section are to evaluate and compare the performance of the proposed algorithm with: (1) the current state of the art radar-odometry algorithm [12] which reported its results with the Oxford [50] and Mulran [20] datasets; (2) the direct methods with best reported performance, namely [21] in the Oxford dataset and [31] in the Mulran dataset.Consequently, we have evaluated R 3 O using Mulran and Oxford dataset.
In order to show that the algorithm did not have its parameters overly calibrated for one dataset or another, we kept all the parameters of the core part of the algorithm fixed.All in all, the only changes that happened between the datasets occurred in the preprocessing stage which is explained by noting that the Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

TABLE I PARAMETERS USED FOR ALGORITHM'S EVALUATION
two datasets use different scanning radars with, e.g.different maximum range.For all the PC operations, the previously mentioned subpixel method explained in Section III-A3 was used along with the enhancement described in Section III-A2.All the parameters used in the evaluation can be seen in Table I.Over the datasets evaluated and with the parameters described, our algorithm was implemented in Python and tested in a computer with 2-Intel E5-2695 and 2-Nvidia Quadro RTX-4000 (using a second thread to compute all the steps after the keyframe selection) obtaining an output rate of approx.4.57 Hz which makes it suitable for online utilization given that both radars have an output rate of 4 Hz.Finally, because the radar was attached to a car, we have used only the translation component in the longitudinal direction for the position update.

B. Oxford
The Oxford Radar dataset uses a vehicle equipped with an assortment of sensors including a Navtech CTS350-X scanning radar with range resolution of 4.38 cm, 0.9 °azimuth resolution, maximum range of 163 m, and output rate of 4 Hz.This dataset has a total of 32 traversals over approximately the same route in Oxford, U.K. under different weather, traffic and lighting conditions.Moreover, ground-truth is provided based on camera, GNSS, and inertial navigation system information [50].For evaluating the performance of the algorithm, we have used the KITTI odometry evaluation [51] which consists of computing the errors in translation and rotation from length [100 m, 200 m, . . ., 800 m], followed by averaging the errors over the errors obtained with different lengths.We used this metric because this was the metric used by other radar-odometry algorithms which assessed their performance using this dataset.Finally, among the 32 possible traversals, we selected the same 8 sequences used by [12], [14], [30].
A comparison between the trajectories estimated by our algorithm and the ground truth can be seen in Fig. 6.Moreover, in Table II a comparison between R 3 O's performance with other radar-based odometry algorithms is shown.Among the direct based methods, R 3 O obtained the best performance in terms of translation and rotation error.In addition, comparing with all algorithm (direct and feature based), R 3 O had the second best performance in terms of rotation error.

C. Mulran
This dataset includes a variety of sequences which include, e.g.dynamic scenarios over urban areas.The radar used on the dataset was a Navtech CIR204-H which is a 360 degrees scanning radar with 0.9 degrees angular resolution and 0.06 m range resolution.Also, this radar has a throughput frequency of 4 Hz and 200 meters maximum range.Moreover, the dataset provides a ground truth trajectory which is calculated by performing SLAM using a fiber optic gyroscope (FOG), Virtual Reference Station Global Positioning System, and iterative closest point (ICP) algorithm.More details of the dataset can be found in [20].Among the sequences provided by the dataset, we have used the DCC, Riverside and KAIST sequences (we selected the same 9 sequences evaluated in [12], [14]) for evaluating our proposed algorithm.Riverside is a sequence with straight runs along a river and two bridges, DCC is a sequence which is structurally distinctive including a mountain and crossroads, and KAIST is in a campus environment.
For evaluating the performance of the algorithms in this dataset, in addition to the KITTI odometry metric, we have used the relative pose error (RPE) explained in [52], [53].This was motivated by noting that both [12] and [31] used this metric in their evaluation using the Mulran dataset.The RPE consists of measuring the difference between the estimated and the true motion in a fixed time, frame or spatial interval following by averaging the errors.Because [12] and [31] use different intervals for their RPE evaluation, it was necessary to compute the RPE using different intervals: (1) for the comparison with [12], we computed the RPE in translation with an interval equal to 1 frame; (2) for the comparison with [31], we computed the RPE in rotation angle and the RPE in translation with 5 different spatial interval parameters which corresponded to 10%, 20%, 30%, 40%, 50% of the total trajectory length.
A comparison between the trajectories estimated by our algorithm in contrast with the ground truth can be seen in Fig. 7.In addition, in Table III a comparison between R 3 O's performance with other radar-based odometry algorithms using the KITTI evaluation is shown.The results in this table show that the proposed algorithm achieved best performance in terms of rotation error and the second best performance in terms of translation error.Moreover, the results using the RPE with step equal to 1 frame are displayed in Table IV which shows that our algorithm overcame the state of the art [12] in terms of translation RPE.
Furthermore, we calculated the RPE with the same steps as Park et al. for the Riverside03 and DCC02 trajectories (which were the 2 trajectories from [20] used by [31]).The boxplots for the RPE with different segments size of the estimators are shown in Fig. 8. Finally, the comparison between the performance of R 3 O and PhaRaO [31] [31] better performance than [31].For example, in the DCC02 sequence, the proposed algorithm had 38% less translation error than [31].

A. Introduction
In this section we evaluate the effect of each step of the algorithm in its overall performance in terms of rotation estimation.Along with that, we also compare our algorithm with Fourier-based registration using polar representation of magnitude spectra as well as by computing a 2D PC over the Radon 2D image of the spectra which we will refer as the Rad2D algorithm (the angle bins had a distance of approx.0.35 °for all algorithms).This is motivated on the ground that we can not only assess the improvements of R 3 O compared to Rad2D and polar, but also the impact of each cardinal step in R 3 O algorithm.Furthermore, we include the influence of the amount of ρ candidates in the algorithm's performance along with the rationale for appropriate selection of candidates in R 3 O.
In this analysis, akin to [12], we have used the 32 sequences of the Oxford dataset which amounts to 240088 radar images.Moreover, the parameters of R 3 O and preprocessing concur with Table I and Section V. Finally, exactly as we had in Section V, we have calculated the Radon Transform of magnitude spectra of size 1023x1023 using distance between displacements equal to one pixel.This implies a total number of 1023 possible ρ candidates, i.e. ρ = [−511, −510, . . ., 0, . . ., 510, 511].

B. Analysis of Individual Candidates
In this analysis, we used only one single ρ for estimating the rotation, i.e. using the rotation estimate of each possible candidate before the outlier removal stage (see Fig. 2).In Fig. 9 the rotation error using the KITTI odometry metric averaged over all sequences' errors versus the displacements is shown.The symmetry around zero is a known property already alluded in Section IV-A; this implies that we can only use effectively half of the displacements.Furthermore, it is also visible in Fig. 9 that the candidates values near zero are those which yield the lowest rotation error.Since the performance ρ candidates is symmetrical, for simplicity, from now on we will show the results only using one of the halves of candidates.
Moreover, we performed a complementary evaluation in order to narrow down the candidates which are meaningful for attaining the correct rotation estimation.For that, using the RPE with step equal to 1 frame, we estimated the ratio/frequency which each candidate estimates a rotation with an error within ±1 angle bin.Therefore, for each RPE estimation, we evaluate whether the magnitude of the error was smaller than 0.35 °.Finally, we divide the sum of occurrences of all the errors within the acceptable boundary by the total amount of error evaluations.In Fig. 10(a) the results of this experiment can be seen; it is noticeable that already at displacement equal to 81, the ratio is below 0.5, also as the displacement increases the ratio decreases.Moreover, by observing Figs. 9 and 10(a), we can observe that at the latest displacement the error goes down and the ratio increases respectively.This can be explained by noticing that at far-displacements the amount of pixels used in the sum for the Radon transform are few and correspond only to the highest frequency components of the magnitude spectra, thus the sum approaches zero, consequently very often the rotation estimation of pairs at the far-displacements are equal to zero.Therefore, when no rotation actually occurred, these far-displacements get the correct result.Fig. 10(b) shows the ratio over the single Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.displacements considering only the samples where there was at least 1 °rotation between them.By juxtaposing the two figures of Fig. 10, it becomes evident the aforementioned rotation estimation characteristics at far-displacements.

C. Ablation Study
After acquiring the information of the individual performance in the preceding subsection, now we observe the accumulative effect of the candidates as well as the impact of each step in R 3 O rotation estimation.For that, we evaluate the algorithm at its different stages with a different number of candidates.The candidates are selected by always starting from displacement zero till the number of candidates minus 1, i.e. when the number of candidates is N, we use the ρ values of [0, 1, . . ., N − 1].In this study we evaluated the algorithm with the number of candidates were [10, 11, 12, . . . , 512].Moreover, we evaluated the algorithm in 4 different stages: (1) Full: R 3 O using all its stages; (2) No-Graph: which we do not use the graph optimization; (3) No-KF: which we not use the graph optimization as well as the keyframe selection; (4) Basic: No-KF without outlier removal.In this study, we used the rotation error estimation using KITTI odometry metric and averaged over all sequences' errors.
Fig. 11 shows the rotation error over the number of candidates including also the rotation error of polar and Rad2D.In Fig. 11 it is possible to note that: r When few candidates are used, less than 27, the keyframe selection and consequently the graph optimization fails to improve the results.This can be understood by considering that the keyframe stage effectiveness requires that the uncertainty estimation to be reliable, which at low sampling r As the number of candidates increase, there is likewise an improvement in the rotation estimation until it reaches a plateau.We denote this plateau as the Optimal Region.Using the number of candidates within this region lead to the best performance (see Table VI).
r After the plateau, we notice that increasing the number of candidates do not lead to an improvement in performance.This can be understood with the aid of Figs. 9 and 10.We believe that at this point, there is a high number of candidates with estimations which are afar from the correct rotation.Consequently, there is no consistent improvement by using them.Moreover, as the number of outliers become more prevalent, it increases the chance that some will survive after the outlier removal which will also deteriorate the performance of the keyframe selection and graph optimization.
r There is a relatively small but consistent improvement by using the graph optimization.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.r After the Optimal Region, as the number of candidates increase, the importance of the outlier removal grows.It is noticeable that at very large number of candidates, the performance gap between the Basic and other versions increases.

D. Computational Time
Finally, we compare the computational time of R 3 O using different number of candidates with Rad2d and polar.For all the evaluations, we use the same machine described in Section V.In Fig. 12 the computational time is shown.It can be seen that R 3 O's computational time grows as the number of candidates increases.The preprocessing step is almost constant through time while the rotation estimation (before the keyframe step) grows with the number of candidates.Moreover, the keyframe stage has constant and almost negligible effect on the algorithm computational time and the last stages in the algorithm (note that these stages run in a second thread which explains the nonlinearity in time complexity, i.e. they can take shorter or longer time based on the resources which the second and the main thread demand concurrently) also have relatively small effect in the total computational time.Furthermore, we see that the polar-based algorithm has a lower computational time compared to Rad2d due to the additional time complexity of the Radon transform.Moreover, we notice that at the number of candidates used in Section V, R 3 O has almost the same time complexity as Rad2d.

VII. CONCLUSION
A method for odometry estimation using radar sensors has been proposed in this article.The theoretical foundations of the method and the details for its implementation were explained.Moreover, R 3 O was compared to other direct and indirect odometry methods.It was shown that R 3 O surpasses all other direct radar odometry methods, and it challenges the state of the art radar odometry algorithm in the Mulran dataset.Furthermore, by evaluating the algorithm across different datasets allowed us to assess that the proposed algorithm holds consistent performance when utilized with different scanning radars under various environments, lightning, traffic and weather conditions.
To conclude, we believe that the pipeline demonstrated in this article, especially the generation of multiple hypotheses for the rotation estimation by exploiting the Radon transform, can represent a new cornerstone for direct odometry methods.The framework introduced in R 3 O can be incorporated in previous approaches, e.g.[31], [32], as well as in a hybrid approach combining dense and feature-based techniques.Moreover, R 3 O can be also used with other sensor modalities, e.g.2D lidar, by adjusting the preprocessing step.

Fig. 3 .
Fig. 3. Example of how a relative rotation between two Cartesian images becomes a shift in Radon domain.In this example we manually rotated one Cartesian radar image by 45 • .From left to right we have: (1) the images in Cartesian coordinates; (2) the magnitude spectra of the images in dB (we plot in dB to facilitate the visualization, but we use the linear scale for calculations);(3) the Radon domain images of the magnitude spectra (note that the ρ = 0 coincides with the middle/center row, and the rows above and below the center are from negative and positive displacement respectively); (4) magnitude vs. angle plots for some selected displacements.
Fig.4.Keyframe selection starts by estimating the rotation and the uncertainty between the reference frame and the frames in the current window.In this example, the uncertainties between F 0 and F 4,6 were greater than the defined threshold parameters.The reasoning for dismissing F 4,6 is elucidated by observing the boxplot (with the whiskers limits based on 1.5 times the interquartile range) where the larger dispersion on their estimations is clear.Therefore F 4,6 were not used for this local orientation estimation.Note that F 5 will be the reference frame in the next iteration.(a) Scatter-plot showing the distribution of the surviving candidates after the MAD outlier removal.(b) Boxplot of the surviving candidates after MAD outlier removal.(c) A graphical representation of the keyframe selection.(d) A graphical representation of the keyframe selection after the thresholds were applied.

Fig. 5 .
Fig. 5. Illustration of the orientation graph used in R 3 O.In this example, the keyframe window has length 3 and the frame F 2 was pruned in the keyframe selection step.The nodes [o 0 , o 1 , o 3 ] represent the local orientation in the iteration.Also, [ φ01 , φ03 , φ13 ] are the rotation factors between each node.

Fig. 9 .
Fig. 9. Rotation error a single candidate using the KITTI evaluation.It is possible to observe that candidates near the center have the smallest errors.

Fig. 10 .
Fig. 10.Ratio which the single candidate rotation estimation was within defined boundary.(a) The ratio over all available samples.(b) The ratio when considering samples which at least 1 °rotation occurred between the pair of radar images.

Fig. 11 .
Fig.11.Ablation study.It is noticeable the effects of each stage of the algorithm in its overall performance as well as that the effects of each stage are directly linked with the number of candidates which are evaluated.

Fig. 12 .
Fig.12.R 3 O's computational time.It is possible to observe that the rotation estimation (before the keyframe stage) have the highest impact in algorithm's computational time.

r
Rad2d performance is better than polar.Nonetheless, any version of R 3 O in the Optimal Region outperforms Rad2d.
is shown in Table V.It is evident by observing Table V that our algorithm achieved Authorized licensed use limited to the terms of the applicable license agreement IEEE.Restrictions apply.

TABLE IV RESULTS
WITH THE MULRAN DATASET -RPE WITH STEP EQUAL TO 1 FRAME TABLE V RESULTS WITH THE MULRAN DATASET -RPE WITH STEPS USED IN

TABLE VI RESULTS
OF THE ABLATION STUDY numbers do not necessarily occurs.We denote this region as the Dearth Region.