ForestTrav: 3D LiDAR-Only Forest Traversability Estimation for Autonomous Ground Vehicles

Autonomous navigation in unstructured vegetated environments remains an open challenge. To successfully operate in these settings, autonomous ground vehicles (AGVs) must assess the environment and determine which vegetation is pliable enough to safely traverse. In this paper, we propose ForestTrav (Forest Traversability): a novel lidar-only (geometric), online traversability estimation (TE) method that can accurately generate a per-voxel traversability estimate for densely vegetated environments, demonstrated in dense subtropical forests. The method leverages a salient, probabilistic 3D voxel representation, continuously fusing incoming lidar measurements to maintain multiple, per-voxel ray statistics, in combination with the structural context and compactness of sparse convolutional neural networks (SCNNs) to perform accurate TE in densely vegetated environments. The proposed method is real-time capable and is shown to outperform state-of-the-art volumetric and 2.5D TE methods by a significant margin (0.62 vs. 0.41 Matthews correlation coefficient (MCC) score at qty 0.1 m voxel resolution) in challenging scenes and to generalize to unseen environments. ForestTrav demonstrates that lidar-only (geometric) methods can provide accurate, online TE in complex, densely-vegetated environments. This capability has not been previously demonstrated in the literature in such complex environments. Further, we analyze the response of the TE methods to the temporal and spatial evolution of the probabilistic map as a function of information accumulated over time during scene exploration. It shows that our method performs well even with limited information in the early stages of exploration, and this provides an additional tool to assess the expected performance during deployment. Finally, to train and assess TE methods in highly-vegetated environments, we collected and labeled a novel, real-world data set and provide it to the community as an open-source resource.


I. INTRODUCTION
Autonomous navigation of ground vehicles in unstructured vegetated environments is essential for many robotics applications but remains an open challenge.Any offroad autonomous navigation requires accurate traversability estimation (TE), which can be defined as the determination of which part of the environment a given robot can or cannot The associate editor coordinating the review of this manuscript and approving it for publication was Hiram Ponce .travel through safely.Existing approaches often assume the environment is rigid.However, in natural environments, this assumption is overly prohibitive as successful navigation requires the robot to push and pass through pliable vegetation.
Unstructured vegetated environments such as forests are also challenging to navigate through because vegetation clutter can result in complex geometric structures with overhanging elements and occlusions.TE in such environments requires a high-fidelity representation of the environment and a method that can leverage the representation with computational efficiency and generate accurate predictions.
Navigation strategies in unstructured environments without vegetation, relying on high-fidelity geometric representation, have proven sufficient in various complex scenarios [1].These approaches commonly represent the environment using 2D or 2.5D maps (one elevation value z per position (x, y) on a plane), grid maps (discretized 2D map), or elevation maps to assess traversability [2].These methods rely on geometric features or heuristics generated from these representations.However, 2.5D maps cannot represent complex vegetated environments sufficiently.Thus, geometric TE approaches have often been considered insufficient in the recent literature, and imagebased modalities have been considered essential for accurate TE [3], [4], [5], [6].
Single-image appearance-based TE methods are adequate for many tasks, e.g.following gravel roads with sparse vegetation and navigating greenhouses [7], [8].They can leverage high-fidelity information contained in images in conjunction with strong assumptions about the environments [3], [6].However, these methods can only assess the next step locally and discard previous information.Hence, single imagebased methods are more prone to failure than probabilistic map representations, due to sensor occlusions or image degradation [5].
Hybrid methods aim to combine geometric and appearancebased methods, leveraging the discriminative power of imagebased modalities with geometric representations [5], [9], [10], [11].However, the underlying geometric component fails to adequately represent the environments, degrading performance of the image-based components, as shown in Fig. 9 and the accompanying video.
In previous work [12], we demonstrated the feasibility of TE in complex vegetated environments using a 3D voxelbased representation containing multiple ray statistics, color fusion and consideration of a voxel's neighborhood.While the method showed strong results in offline processing, it was unsuitable for real-world deployment due to computational cost constraints and performance degradation while exploring a new environment.The performance of the method also showed limitations in the highly challenging, densely vegetated environments such as those addressed in this work.sparse convolutional neural networks (SCNNs) have shown low inference times for sparse voxel representations, which are suitable for TE estimation and scene completion in structured environments [13].However, their use has not been explored for TE in vegetated environments.Our work leverages these findings and presents a novel TE approach suitable for realworld deployment (in terms of accuracy and inference time) in complex vegetated environments such as those shown in Fig. 1.
In this work, we claim the following contributions.
• We propose a novel method for traversability estimation named ForestTrav (Forest Traversability).The proposed method represents the environment with a high-fidelity and feature-rich 3D voxel representation relying only on range sensor data.In addition, it leverages structural context and sparseness properties of SCNNs to allow for rapid inference.The small model size (approximately 2 million parameters [1.7 MB] per model) allows for dataand time-efficient training and deployment.The data set size required for training is economical and the method is trained exclusively on real-world data.
• We demonstrate the suitability of the proposed method for TE in complex natural terrain such as a highly vegetated forest environment.The proposed method is thoroughly evaluated on a challenging real-world data set and is shown to significantly outperform state-of-the-art (SOTA) methods in complex scenes and to generalize to unseen environments.
• We propose an evaluation metric to assess the robustness of an estimator's performance with respect to the temporally evolving, 3D probabilistic map.It quantifies performance changes that can occur while the robot moves into a new, unseen area.This is not addressed in the current TE literature.
• Finally, we provide open-source access to our novel data set containing voxelized input features and output labels generated by our accurate labeling method.We provide nine different forest environments with varying degrees of vegetation.Our labeling method combines robot experience and expert domain knowledge in a probabilistic fashion.This work and contributions address the research gap of TE for autonomous ground vehicles (AGVs) in vegetated environments, specifically in full 3D leveraging only range sensors.The presented method can be considered a ''geometric'' method, relying solely on lidar measurements.Most prior works claim that TE in vegetated environments requires appearance-based modalities.We show that this is not necessarily true, and our approach provides a highly accurate geometric solution shown to outperform methods.We believe that making the data available is extremely important as we have observed that the definition and complexity of ''forest environments'' varies significantly across different works in the literature.Although subjective, we argue that the forest navigation scenarios dealt with in this paper, which can be seen in our accompanying video, are more complex and challenging than most presented in prior work, further highlighting the merit of our method.

II. RELATED WORK
In this section, we discuss prior work relevant to traversability estimation for unstructured, vegetated environments.Traversability estimation or terrain assessment is the assessment of a patch of terrain in order to determine a ground robot's ability to enter, reside in, and exit that patch or volume without entering a failure state [14].In the binary case this can be defined either as safe to traverse or not.In the continuous case, TE can be seen as the likelihood of entering a failure state.In the autonomous navigation literature, environments are characterized by the degree of structure humans impose on an environment.Structured environments are commonly man-made (e.g.offices, roads).Semi-structured environments are areas where some degree of structure is imposed, but there are uncontrolled elements, e.g.agriculture or driving on gravel roads (off-road).This work focuses on unstructured environments, where no human-imposed order or structure is introduced.Hence, research conducted in the area of autonomous navigation for agriculture, construction, or off-road driving is often not suitable for our case since these areas make strong assumptions about the environment based on the imposed structure.We refer the interested reader to broader surveys in these areas [15], [16].
To classify an environment as ''vegetated'' is inherently subjective.In different works it can encompass natural areas devoid of overhanging foliage, a large area with a few trees [17] or an area with a cluster of trees in a park or forest void of undergrowth [9].Consequently, there is a large variation in works presented in the context of vegetated environments, or more specifically forests.Our video depicts an example of a scene from our data set that, compared to the literature, contains significantly denser vegetation with significantly more obstacles and clutter both at and above ground height.

A. GEOMETRIC TRAVERSABILITY ESTIMATION IN VEGETATED ENVIRONMENTS
Geometric TE methods rely on reconstructing and maintaining a geometric representation of the environment using range sensors, RGBD cameras, or stereo-vision, usually assuming a rigid environment.The recent DARPA SubT Challenge required robots to navigate large-scale, complex underground environments and find and report object locations autonomously.The top two teams relied on geometric methods generating 2.5D cost maps [1].The second-placed team generated 2.5D cost maps from a 3D probabilistic occupancy map and relied on classical heuristics, e.g.max slope angle and step height [2].The winning team deployed a probabilistic GPU-based 2.5D elevation map with high grid resolution (4 cm), and relied on learningbased TE [18].2.5D representations, however, suffer in the presence of clutter and overhanging elements, as they cannot naturally represent them, which results in viable paths being closed off.
TE approaches in vegetated environments relying solely on lidar aim to capture the environment in 3D with additional salient features.Point-cloud-based approaches [4], [19] have aimed to generate salient features, such as eigenvaluefeatures and slope inclination, without a probabilistic representation and have had little success.They suffer from high computational costs, low fidelity for discrimination, and noise from outliers.Probabilistic 3D voxel grids have been able to address the memory and computational issues at the cost of discretization [20], [21], [22].Octomap [20] efficiently models occupancy of the environment but lacks any additional salient features for accurate TE in complex unstructured environments.Sarrin et al. [21] introduced subvoxel resolution occupancy mapping, which was subsequently expanded to a Normal Distributed Transform Traversability Map (NDT-TM) [4].NDT-TM introduced salient ray features, intensity, permeability, roughness, and slope angle to estimate TE in vegetated environments at 0.4 m voxel resolution.Our prior work further enhanced this by fusing lidar multiple returns (or echoes), color, and adjacency features into the map representation [12].This allowed for TE in challenging vegetated environments for post-processed maps [12] at higher resolutions (0.1 m).However, this method suffers from computational constraints and accuracy degradation during deployment, when sensor readings are received continuously on the robot rather than post-processed.Maintaining additional ray-based features per voxel comes at a high computational and memory cost, and is therefore rare for real-time systems at high resolution.Recent work has shown that SCNNs are suitable for TE in complex structured environments by leveraging contextual information of the environment [13] whilst being computationally efficient.The method was trained purely in simulation using 3D occupancy and requiring a large data set, equivalent to 57 years of real-world experience, a quantity that would be difficult and costly to obtain in real vegetated environments.

B. VISION-BASED TRAVERSABILITY ESTIMATION
In scenarios with vegetation, geometric-only methods are commonly considered insufficient as they lack discrepancy.Hence, many studies have aimed to solve TE in these environments using solely image modalities.These methods aim to assess TE and plan from a single image (single viewpoint) [3], [6], [23].Classically, they aim to classify the terrain into different semantic classes and assign each semantic class a traversability score.The concept of class drift and often fluid boundary between classes, e.g.bush vs bramble, can become challenging.However, they have shown high accuracy in complex situations but can be prone to reliability issues, e.g.sensor reading degradation or camera occlusions [5], as they gather information from a single input and retain no memory.

C. HYBRID METHODS
Late-fusion methods combine multiple representations that can stem from different sensor modalities or estimation processes, e.g.fusing semantic segmentation with a 2.5D elevation map.Maturana et al. [11] demonstrated the fusion of semantics and 2.5D height maps for off-road (gravel road) navigation, avoiding vegetation where possible.Semantic and spatial cost maps (e.g.semantics, height difference, ceiling height) for high-speed off-road driving using stereo-vision has been successfully implemented [17].The authors emphasize fast inference time at the cost of (geometric) representation fidelity and aim to avoid complex areas.Bradley et al. [5] demonstrated a TE approach of separating the environment support surface and an above-ground point cloud classification using classical machine learning and semantic segmentation on a data set collected in different challenging environments.Similarly, an approach relying on semantic segmentation for traversability assessment and support surface estimation from an RGBD sensor has shown good results [10].Frey et al. [9] demonstrated an incremental, online learning approach that combines a traversability signal with anomaly detection.Realtime operation is shown in natural environments, mostly parks and hiking trails with light vegetation.The authors claim a vision-only approach but project their image-based classification onto a 2.5D height map [18] to generate signed distance fields suitable for navigation.In many of these cases, the fusion of these methods relies on the underlying geometric representations as 2.5D height maps.Conversely, they are unsuitable for environments with significant vegetation and overhanging clutter, as shown in Fig. 8.

D. LEARNING ON 3D REPRESENTATIONS
Learning-based methods on 3D data is a growing field of interest.However, research lags behind the vision-based community's performance for general learning tasks such as classification and semantic segmentation.In general, methods are often developed in the 2D image domain and then adapted to 3D [24].The same recent review noted that challenges of learning on 3D representations arise from sparsity of data, lack of salient features compared to image modalities, large training data sets required, and large computational requirements.SCNN allow for a reduction of computational time by ignoring empty voxels [25].SCNN using a U-Net architecture with skip connections has been shown to allow for the learning and embedding of features and surrounding context for different desired tasks, such as semantic instance segmentation [26], multi-object classification [27], scene completion [28] and detection [29].Compared to our proposed method these approaches rely on a combination of occupancy representation and/or (point-or voxel-wise) semantic labels, and not a rich ray-based probabilistic map.Further, we can leverage a small data set, data collected in a few hours, compared to the (simulated) experience equivalent to 57 real-time years in [13], [28].

E. LEARNING FROM EXPERIENCE AND DATA SET
Generating accurately labeled data is essential for learningbased methods.For TE, currently available simulators do not offer enough physical or perceptual accuracy to replace real-world data gathering.Gathering self-supervised data by associating the robot's state with sensory data to produce high-quality ground truth data is referred to as learning from experience (LfE) [3], [4], [6], [13].The positive, traversable case, where the robot successfully traversed, can be easily obtained and is commonly used.However, gathering nontraversable examples can be hazardous and costly, potentially damaging the robot, and hence, heuristics or hand-labels are typically favored in the literature.In contrast to the works above, in this paper we present a principled approach that probabilistically combines hand labels with robot experience to generate high-quality labeled data.
Additionally, few data sets are available in the target domain [30], [31], [32].RUGD [31] contains a combination of images of natural environments, whereas Rellis3D [30] generates labeled point cloud data with image-to-lidar projection, annotating natural elements as ''grass'', ''bush'' and ''tree''.A pure point cloud data set was announced but, to the best of the authors' knowledge, not yet released by Bae et al. [32] containing point-wise labels in off-road environments void of dense underbrush or challenging vegetation.A data set containing ray-based features and labels is not available.
Our paper builds on existing literature's findings by leveraging a rich probabilistic map representation and the SCNN's ability to incorporate environmental context to form a novel, high-performing, real-time capable system.Our work further differentiates itself from the SOTA by only using a limited amount of real-world data in combination with robot experience.The resulting method is capable of performing accurate TE in environments more challenging than previously demonstrated by other methods.

III. METHODOLOGY A. PROBLEM DEFINITIONS AND METHOD OVERVIEW
The goal of this work is to generate a full 3D TE map suitable for allowing an AGV to navigate in natural, densely-vegetated environments.This work represents the environment with a probabilistic voxel map M , containing non-overlapping 3D box volumes where each voxel m retains ray statistics.Further, we assume each voxel m ∈ M has a true binary traversability state τ ∈ {TR, NT } where TR = 1 and NT = 0 represent traversable and non-traversable states respectively.This work uses a supervised learning approach to learn a parametric function f (x) that can estimate τ for each voxel from the input sample x described below, formally τ = f (x).This work aims to learn this parametric model that can accurately assess τ on a map M that is continuously updated whilst the robot is moving and sensing the environment.
Fig. 2 provides a high-level overview of the proposed method for the inference step, allowing for online TE.A continuous stream of lidar measurements is fused into a 3D probabilistic voxel map in a static global frame during robot operation.The map is initialized with no prior values at the start of a deployment and is progressively updated as the robot traverses the environment and records sensor data.The map internally tracks multiple lidar ray statistics for each voxel through raycasting; commonly at 0.1 m resolution.Given sufficient measurements, the statistics within the voxel itself are assumed to converge and adequately represent the underlying environment, further discussed in Section III-B.During deployment, a local region is extracted from the global probabilistic map at fixed time intervals.The local feature map contains all voxels and their statistics for a box volume centered around the current robot pose.The voxel statistics are used directly as the input features to the prediction network, avoiding any additional feature calculation, see III-B.The local feature map is then passed to an ensemble of prediction networks.The ensemble consists of N binary traversability classifiers (U-Net models) trained on the same data from different (random) weight initializations.Each model estimates a binary traversability state for each output voxel, generating a binary traversability map at the same size and resolution as the input map.In Fig. 2 red voxels are non-traversable and green voxels are traversable.Finally, a per-voxel traversability probability is calculated by averaging the binary traversability estimate of the N models.We use this deep ensemble to increase robustness against outliers and generalize performance to novel environments.Ensembles have proved to be a popular choice for generating probabilistic estimates and uncertainty quantification from neural networks [33], although we do not provide calibration scores.

B. PROBABILISTIC 3D ENVIRONMENTAL REPRESENTATION
The mapping process aims to estimate each voxel's state based on a continuous stream of lidar observations.The state is a set of different ray-based probability distributions assumed to represent the environment given sufficient sensor readings.Our approach uses normal distributions transform occupancy maps (NDT-OM) [21] to capture sub-voxel resolution geometric distributions and extends the number of statistics to allow for accurate traversability estimation in dense forest environments.More specifically, each voxel m contains: a 3D multi-variate Gaussian distribution with probabilistic occupancy information characterized by the mean position µ NDT and covariance NDT = S NDT * S T NDT (stored as the triangular square root covariance matrix S NDT ∈ R 6 for computational efficiency); the number of rays that ended within this voxel N OCC ; the occupancy probability in log-odds form l OCC ; and the ''hit'' and ''miss'' counts of the rays that have statistically ended or passed through the normal distributions transform (NDT) distribution with a close enough margin.These voxel statistics have been previously used for permeability calculation in [4].In addition, we also store in each voxel: the lidar intensity mean µ INT and variance σ 2 INT , which is relevant to chlorophyll-rich elements [4], and the number of multi-returns N MR which occur when a lidar beam is partially split from thin elements or edges, for example.Note that an increase in the number of second returns can be observed in voxels containing multiple small vegetation elements such as grass blades, leaves and thin stems.
We rely solely on the values characterizing the distribution for the proposed method.This avoids computationally expensive additional feature calculations and allows us to learn 37196 VOLUME 12, 2024 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
the feature embedding directly.An input feature x ∈ R 13 is defined as Our previous work explored the relevancy of features calculated from these statistics [12].We have developed our online mapping method on top of occupancy homogeneous map (OHM) [22], an open-source 3D probabilistic mapping framework developed with performance in mind.This allows us to maintain numerous salient voxel statistics during online deployment.No other 3D probabilistic mapping framework is currently available to provide such feature-rich and real-time capable 3D representations.

C. DATA SET GENERATION
Our method relies on two different processes to generate the probabilistic map representation.All the maps are generated from recordings of real-world data, recorded as the raw sensor readings of a robotic platform, see Sec.IV-A.
The offline, post-processed probabilistic map M Post is generated from all data from a particular experiment.Using an optimized global, post-processed trajectory to minimize drift with the SOTA Wildcat inertial-lidar SLAM algorithm [34].This post-processing pipeline can handle long deployments and fuse data on large-scale maps on a consumer-grade laptop without needing to release or remove any data; the map is unbounded.The offline, post-processed map generates training and test data sets.
The online-generated probabilistic map M (t) for a particular experiment is the state of the probabilistic map at time t, given all previous sensor measurements and pose estimates from the start of the experiment t 0 up to t.This map is used for the online inference and qualitative evaluation presented in Sec.IV-E and the accompanying video.To ensure online real-time capability, a fixed-size local map around the robot is maintained due to memory and computation constraints.Compared to the post-processed map M Post , M (t) uses a local, online SLAM estimate for the pose estimate and may suffer from drift.Hence, to compare M Post and M (t), M (t) is aligned to M Post using the iterative-closest point algorithm before calculating error statistics [35].Some alignment errors may persist between local M (t) maps and M Post , leading to the incomplete coverage statistics shown in Fig. 10.
During robot deployment, AGVs commonly need to move into previously unobserved spaces.Initially, due to the limited observations, little information about that new area is available, and the quality of the map is low.The map quality temporally evolves and improves with the availability of additional sensor measurements and often by the AGV moving.Therefore, the performance of the TE depends on the quality of this spatially and temporally evolving map.Additionally, learningbased methods are commonly trained on post-processed maps, which have the highest possible quality.Hence, it is important to understand the performance implications of a method with regard to these temporal effects.This may arise due to distribution shifts between training and data available during deployment.This can lead to reduced accuracy, as was observed in [12].In this work, we additionally aim to understand how resilient an estimator's performance is to map quality.
The term ''map quality'' is a conceptual term that reflects how well the statistical distribution in each voxel represents the true environment and how well it has been observed.Voxels and associated feature statistics are assumed to be spatially independent, and combining different features or statistics is a common practice for data-driven and learningbased approaches.We treat time as a proxy for map quality and assume that the map improves as more data is gathered.

D. PROBABILISTIC COLLISION MAPS FOR LEARNING FROM EXPERIENCE
Any supervised data-driven approach for TE requires accurately-labeled data.As noted in previous work, labeling the traversability of densely vegetated environments can be particularly challenging in practice.Therefore, in this work we combine learning from robot experience (LfE) and human domain knowledge to generate accurately labeled data in a post-processing step.An additional collision layer is added to the map M Post that already contains the exteroceptive data.The collision layer tracks for each voxel m the likelihood of being part of a collision l c (m|c 1:t , T 1:t ) and thus non-traversable.The trajectory T 1:t and collision states c 1:t are the sequence of recorded robot poses T t and collision states c t ∈ {TR, NT } at time t.A stationary binary-Bayesian filter in its log odds formulation is used to process the robot experience and update the voxels recursively.Compared to Octomap [20], we use the robot's pose and its collision state as the observation model instead of a range sensor, recording the robot's experience.For a voxel m the update is performed as: If the robot is in a collision-free state c = TR, then all voxels m within the robot's bounding box are updated.In the case of a collision c = NT , the voxels within the bounding box at the front of the robot and a threshold distance beyond (0.2 m) are observed as non-traversable.The rationale is that the obstacles often extend beyond the voxel containing the object the agent directly interacts with, e.g.tree trunk or thick bramble bush.The distance threshold value was found heuristically and depends on the environment, not the voxel resolution itself.This observation model assumes that collisions are due to the elements in front of the agent and prevent it from advancing when the robot moves predominantly forward.During the collection of the data set for this work, this was enforced by the operator turning only in safe areas and primarily moving forward during data collection.A visualization is shown in Fig. 4. In practice, hand-labeled data is used to initialize the collision layer with fixed prior probabilities for each traversability class with empirically selected values of p NT = 0.3 and p TR = 0.7.During data collection, an expert  remotely operates the robot through vegetated environments and manually records collision events on the handset.The experience of the robot overrides hand-labeled data.The rationale is that the experience of the robot itself, failing or succeeding to traverse a region, is what we are trying to label.Issues arising from discretization are addressed using the probabilistic updates.Hence, this method allows us to generate high-quality data for our learning approach combining domain knowledge and refining it with robot experience in a principled fashion.

E. NETWORK ARCHITECTURE AND TRAINING PROCEDURE
We employ a 5-layer U-Net architecture [36] (see Fig. 5).This architecture has been shown to be suitable for the tasks of classification, scene completion and TE in structured environments on sparse voxelized 3D data for robotic applications [28].Comparatively, we differ from this work by leveraging a much richer feature set per voxel, where [28] relies on either occupancy and semantics or a combination of both.The resulting network is relatively small (∼2M parameters versus standard U-Net ∼30M).We use cross- entropy loss from the TorchSparse implementation of SCNNs [37].
To combine the different post-processed scenes or maps into a data set format suitable for training, we split the maps into smaller cubes of 32 × 32 × 32 voxels.Each voxel appears in only one cube, and a cube requires at least n = 150 non-empty voxels to be considered valid.These map cubes are then concatenated into a single large data set.We employ a classical ten-fold cross-validation with a held-out test scene (#9) ensuring a fair comparison against other methods.The input features are individually scaled using zero-mean normalization, with the scaling calculated based on the training set, excluding the validation data set.Different data augmentation methods are applied without violating the voxelization.To introduce noise, a 5 % pruning chance is applied on a voxel-based level.Geometric augmentations are applied on a cube level, including mirroring, translation, and fixed angle rotation (φ ∈ {0, 90, 180, 270} • around the gravity-aligned axis with a uniform chance).The fixed rotation intervals are required to maintain alignment with the voxel representation.In addition, a random translation of 1-10 voxels is applied to each cube with a 50 % chance in any axis-aligned direction.

F. DATA SET
The data set generated as part of this work has been opensourced, 1 and Table 1 provides the statistical summary.The data set contains a total of 1,016,243 points and covers approximately 3260 m 2 from nine different vegetated environments.Each point contains raw distribution values of the layers, the TE labels, and the ray features from normal distributions transform traversability maps (NDT-TM) and normal distributed transform forest traversability mapping (FTM) [4], [12].This allows this data set to be useful for future novel methods and easy comparison.In addition, this data set contains high-quality hand labels directly generated on the point cloud itself, avoiding miss-labeling elements due to occlusion and clutter in the vegetated environment when using point-to-image projection.The data was collected by an expert initially tele-operating an AGV in vegetated environments and recording collision states.Using the described method above III-D, the exteroceptive data was post-processed into M Post and point clouds.The resulting point cloud was handlabeled for traversability class by an expert with domain knowledge.The different data sources, M Post , hand-labeled data, and collision maps were finally fused, as described in Sec.III-D.This produces a high-quality data set, and visualizations are shown in Fig. 6.
Table 1 summarizes the environments and data sets by scene at 0.1 m voxel resolution.The location of the scenes can be seen in the bottom right image of Fig. 4. The second column shows the total number of labeled voxels N labels .Columns three to six show the percentage of labels that are traversable (TR) or non-traversable (NT ), and the labeling method; handlabels (HL) or learning from experience (LfE).
The seventh column shows the mean column vegetation density (CVD) per scene.In this work, we define CVD as a column-based metric that aims to capture how much of the volume above the ground, and including a patch of ground, contains measurements relevant for TE for the robot.We use CVD as a proxy for vegetation density as our environment only contains vegetation.Further, we only consider elements up to a meter above the ground.This is the intersection most AGVs will interact with and for which they need to reason about the vegetation pliability.Elements above that height, such as the dense foliage of the tree, can skew the density metric.For each column of voxels, the set of voxels starting from and including the ground point up to a threshold of 1 m are considered.For that column, CVD is the ratio of the number of voxels containing measurements (N meas ) over the total number of voxels in the set (N total ); ρ V = N meas N total .It is free from sensor models and can be calculated on voxelized or raw point clouds and can provide a comparison metric against other data sets.A higher number indicates greater density in a scene hence more potential traversability obstacles.The highest value of CVD corresponds to the presence of vegetation in every voxel of the column from the ground to 1 m height, while the lowest value corresponds to the presence of vegetation only at the 1 Data sets available at https://data.csiro.au/collection/csiro:58941ground (up to a height nearing the size of the voxel).Lastly, we provide the dimensions of the bounding volume for each scene in x, y, z gravity-aligned coordinate frames, the default reference frame for our SLAM system.

IV. EXPERIMENTS AND RESULTS
In this section, we provide a comprehensive experimental evaluation of our proposed method.The first subsection gives an overview of the platform and implementation used in this study.The second details our method's experiments, contrasting them with the SOTA.The third offers an ablation study to understand performance, while the fourth analyzes the estimator's robustness to map quality.The final subsection illustrates qualitative examples using our open-source forest data set.
We evaluate the performance of the proposed method using the Matthews correlation coefficient (MCC) [38].MCC is a metric for assessing binary classification that is suitable for imbalanced data sets.Unlike the F1 score, it considers all four classification cases (TP, FP, TN, FN), making it robust against class imbalance and offering invariance to class swap [38].Valid MCC scores range from -1 to 1, where 0 is a random chance, 1 is perfectly correlated, and -1 is a negatively correlated model.Other commonly used classification metrics such as the F1-score, precision, and recall, overestimate an estimator's performance in cases of high class imbalance and depend on the choice of the positive class.In terrain traversability estimation, the common convention is to choose the traversable class as the positive class, which is typically also the majority class.Hence, we argue that using the F1 score alone may not be suitable for this application.Nonetheless, the F1-score is provided in this paper along with the MCC to allow for some comparison with the SOTA.In the cases where cross-validation is used, the mean µ and standard deviation σ are reported to provide insight into how well the models are trained and generalized.

A. EXPERIMENTAL PLATFORM AND IMPLEMENTATION DETAILS
Data was collected with a Dynamic Tracked Robot (DTR), a 35 kg tracked vehicle equipped with a sensory pack containing IMU, a rotating Velodyne VLP-16 lidar angled at 45 • and four RGB cameras.The lidar is mounted on the robot and performs a full revolution at 2 Hz.Fig. 1 depicts the vehicle in an environment representative of this work.Further details on the platform can be found in [12].
The following learning parameters are used for training our method: learning rate l r = 10 −4 , weight decay w = 9 × 10 −4 , batch size b = 64, early stopping patience FIGURE 6. Top far left: QCAT testing facility in Queensland, Australia and the approximate locations where the data sets were gathered.[1] and [2]: robot in forest environments from the robot and external viewpoint, respectively.[3] and [4]: robot in tall grass with some large trees.[7]: environment with a mixture of small to medium trees and patchy vegetation.es = 5 and maximum epoch number epoch max = 250, with average convergence between 50 − 100 epochs.The ADAM optimizer was used to train the models.The loss calculated on the validation set was used as the early stopping criterion.Hyperparameters were tuned using a grid search and choosing the settings with the lowest MCC-score variation for a tenfold cross-validation only of the training data.To compare the different methods, a ten-fold-cross validation was used with a hold-out test set.Model weights are generated, stored and retrieved using the Python PyTorch library [39] with the TorchSparse extension [37].
Training time was approximately 210 s per model onboard a Precision 7750 laptop equipped with a Quadro RTX 5000 Mobile Max-Q, 64 GB of system memory, and a 2.30 GHz Intel Core i7-10875H CPU.On this hardware, the ensemble of 10 models processes each local map at 0.1 m voxel resolution at 3 Hz for a 10 m × 10 m × 2 m scene and above 1 Hz for a 20 m × 20 m × 5 m scene.

B. EVALUATION 1) ACCURACY AND COMPARISON TO SOTA 3D METHODS
We evaluate our proposed method against SOTA TE methods that utilize 3D voxel representations and range sensors.Our comparison includes the classical Constant Threshold Classifier (CTC), which relies on the NDT representation and leverages heuristics of roughness and slope angles [40].The CTC method was tuned on the test set to demonstrate the best possible performance heuristics achievable by this method in an ideal scenario; θ threshold = 30 • and ρ = 0.001.Further, we compare against NDT-TM [4], and FTM [12], two SOTA methods that leverage 3D probabilistic voxel representations that tackle TE in vegetated environments.NDT-TM and FTM were trained as described in [12], and the reported results were reproduced to ensure a fair comparison.All methods were tuned as best as possible and were evaluated and compared on the unseen test set, Scene #9.This aligns with common machine learning practices to ensure fair evaluation and comparison.In addition, it avoids possible domain knowledge leakage during training, a known problem with cross-validation approaches.Results for voxel sizes of 0.1 m and 0.2 m are presented.Lower resolutions are considered too coarse for the cluttered environment.For online deployment, higher resolutions are significantly more computationally expensive and provide a diminishing return in discriminating the environment, particularly since NDT allows for sub-voxel resolution.Additionally, they require denser lidar scanning (more lidar beams) to ensure that enough laser rays traverse each voxel, ensuring the convergence of voxel statistics.
Table 2 summarizes the results across methods.It shows that our proposed ForestTrav method significantly increases TE classification accuracy over other 3D voxel-based techniques, with MCC increasing from 0.41 to 0.62 over the best performing alternative method (FTM) for 0.1 m voxel resolution and 0.38 to 0.49 for 0.2 m resolution.F1 scores obtained by ForestTrav are also significantly increased, reaching a high-performing 0.82 at 0.1 m resolution compared to 0.61 for FTM.The relative comparative performance between the SOTA methods aligns with reported observations in [4], [12].At higher resolutions, the geometric method CTC outperforms NDT-TM.Note the initial results in [4] were reported at much lower voxel resolutions (0.4 m).Fig. 7 A and E illustrate colorized, voxelized point clouds with the ground truth of the test set at a 0.1 m voxel resolution, encompassing open spaces and various vegetation densities.Significant vegetation near the ground, a typical challenge in forest navigation, must be considered for safe robot traversal.NDT-TM accurately classifies all nontraversable elements but fails to distinguish pliable vegetation near the ground, mislabeling them as non-traversable (orange).In contrast, FTM exhibits better discrimination of traversability probability but similarly misclassifies significant vegetated elements, albeit less often.However, neither method would allow an AGV to traverse this environment.ForestTrav generally identifies traversable and non-traversable elements accurately and would allow the robot to navigate safely in the scene.However, it misclassifies a central small tree trunk, the foot of some tree trunks and some trees on the scene's edges as traversable.We view the foot of the trees as a low concern, given that many voxels above are correctly classified as nontraversable, which would prevent planning through the tree.The trees on the edges of the scene are believed to be border effects occurring where the scenes were cropped.We have verified that these elements are correctly classified with more context (not part of the cropped test scene).Furthermore, we note that the model's traversability probability estimations are close to either extreme (0 or 1), with few values in the middle range.Ensembles are generally used as a technique to generate probabilities and epistemic uncertainty from neural networks, but can be overconfident in their predictions [41].

2) PERFORMANCE FOR DIFFERENT VEGETATION DENSITIES
In this experiment we aim to understand if the three methods show differences in performance with respect to the amount of vegetation present, represented by the proxy CVD introduced in Sec.III-F.MCC summary statistics to make these assessments were computed for test-scene #9.All voxels were associated with a CVD score based on the column they fall into, even those above 1 m.Points in voxel above 1 m are included in the evaluation to keep the same baseline of comparison as in results previously presented in Sec. 2. Fig. 8 shows an overall performance gap between ForesTrav and the other methods for all CVD.Firstly, we note that ForestTrav shows much higher accuracy than the SOTA methods over the full range of CVD values.Secondly, all three methods show a trend where performance is higher at low and high CVD, with reduced performance for the intermediate values of CVD.The high performance for the extreme cases of CVD could be due to seemingly easier cases, where a column contains either little vegetation above the ground (lowest CVD) or fully contains a large dense obstacle, such as a tree trunk (highest CVD).The plot suggests that intermediate CVD values are the hardest cases to assess accurately, which intuition would also suggest.The dashed brown histogram (note the distinct scale on the vertical axis on the right of the graph) indicates the points used per CVD bin.This indicates that most of the data lies within 0.2 and 0.6 CVD, even though points above 1 m were included.This shows the need to increase performance in those areas in particular, as that is where the methods struggle most.
3) COMPARISON AGAINST SOTA 2.5D METHODS By comparing our approach with two 2.5D geometric elevation map methods, we aim to highlight the challenges and constraints of 2.5D methods in our target environment.The first method, CSIRO-TE, creates a 2.5D elevation map from the same 3D probabilistic voxel representation we use but does not leverage any other salient features.Traversability is assessed using geometric clues, e.g.slope steepness and step height [2].The second method, Elevation Mapping CuPy (EM-CuPy), relies on a GPU-based implementation for elevation mapping.This approach uses a learning-based model with different receptive fields [18].We use the same data sources for all methods and tune each approach as best as possible for the current environment.Both SOTA methods have been successfully demonstrated in long-term navigation during the DARPA SubT challenge [1] and used by the top two competitors.For a fair comparison, we compressed our 3D maps containing the traversability estimate of our method to 2D grid maps at the same resolution (0.1 m).Each grid cell is estimated to be either traversable or non-traversable and corresponds to a map column.For each column of the voxel map, we designated the lowest point as the ground voxel and considered all traversability predictions of the set of voxels in the column from, and including, the ground up to 1m.We used a conservative approach, where the corresponding grid cell was considered traversable only if all voxels in the set were traversable.
Our 2D cost map computed based on ForestTrav shows the highest performance of all the methods in Table 3, indicating the benefits of estimating TE in full 3D before compressing it into 2D.CSIRO-TE maintains a similar voxel representation at the same resolution as our method, but cannot assess the environment accurately.Alternatively, EM-CuPy uses a highresolution 2.5D map but is still ill-suited for environments with overhanging elements and high clutter, as illustrated in Fig. 9.It also fails to deal with grass elements and small bushes since they are assumed to be rigid obstacles.2.5D geometric representations like this are commonly used as a backbone for SOTA appearance-based or mixed TE estimation methods [9].

C. FEATURE ABLATION
An ablation study was performed to understand the benefits or limitations of using distribution values directly compared to classical feature calculations.Six different cases are  examined; F OCC , F RGBO , F NDT −TM , F FTM , and two versions of our method using the distributions directly, F ForestTrav and, F ForestTrav+RGB .The first feature set contains only the occupancy features F OCC = [N OCC , l OCC ] from NDT-OM [21].F RGBO contains the occupancy features and additional mean estimation of RGB color channels; . F NDT −TM and F FTM are the feature sets used in the respective methods [4], [12].Lastly, F ForestTrav+RGB is the presented lidar-only method augmented with RGB color information per voxel.The same training and testing procedure is used in section IV-B1.The findings in Table 4 indicate that our method's performance is comparable whether it is using RGB information or the NDT-TM feature set.While the pure NDT-OM occupancy-based variant shows impressive performance, there is a clear indication that additional salient features introduce benefits for both resolutions.It highlights the SCNN's ability to utilize context at 0.1 m resolution and notes that it diminishes at a lower (0.2 m) resolution.The other variations perform comparably at this lower resolution, except NDT-TM, which outperforms all others significantly.

D. ESTIMATOR ROBUSTNESS TO MAP QUALITY
This experiment demonstrates a novel method for evaluating the robustness of a TE model with regard to the map quality, as described in Sec.III-C.It aims to quantify the accuracy of the TE performance of an AGV as it explores a new environment when the map quality is initially low and increases with continuous observation of the environment.In addition, this examines if there is a shift in distribution between the training data (post-processed data) and data acquired in real time.To simulate this process, we extracted voxel maps at discrete time intervals along the robot trajectory.We compared the estimator's prediction of each timedependent map M (t) against the post-processed ground-truth map M Post for an unobserved test scene (#9).The results are comparable to Table 2 since the same test scene was used.Fig. 10 shows the performance of the TE estimators under this setting.The solid lines illustrate the performance of the compared volumetric approaches over time.The brown dashed line shows the association of the probabilistic map M (t) to the final ground truth map M Post .It approximates how much of the scene has been observed at least once; the percentage grows continuously until near full coverage.As indicated in Sec.III-A, the differences can be explained by the odometry and SLAM trajectory alignment.Our method shows high performance throughout the full evaluation.Performance starts slightly higher than 0.7 MCC and slightly decreases to a nearsteady state after ∼30 s.This indicates a combined reliance on the context and features for accurate TE.A similar MCC score to Table IV-B1 indicates that our method is robust for deployment.The FTM method starts below 0.2 MCC and increases until it reaches 0.4 MCC after ∼60 s, subsequently remaining in a steady state with comparable performance to the previous experiment.This indicates that FTM is less robust to the map quality but can reach the same performance as in Table 2, given sufficient information and time.This makes this method more suitable for offline processing than online deployment, however, it is still much less accurate than our proposed method.Similarly, NDT-TM performance increases continuously until it reaches a steady state, though it never reaches the same performance as in Table 2.This shows that its performance degrades during deployment when exploring novel environments, possible reasons could be overfitting, or a distribution shift between to training data and data acquired during deployment.ForestTrav is clearly more robust to map quality than other SOTA methods.This evaluation also offers a practical way to characterize and gauge the suitability of a TE method for exploration.In an exploration task, one would expect to push into new environments continuously.Hence, the strong performance of ForestTrav earlier in the trajectory is significant.

E. QUALITATIVE EVALUATION
Fig. 11 shows the TE classification for an exploration setting where the robot drives from A) to E) remotely controlled by an operator and encounters different scenarios in a new, previously unseen environment.The examples are shown for the online inference of our method on raw, real-world sensory data.A wider and heavier platform, but with similar height and similar sensor configuration (double the spin rate) was used.Image A shows the TE probability 20 s after the mission starts and B approximately 10 s later.From A) different foliage, green elements can be seen, as well as thin stems littering the environment.From the robot's point of view, many parts of the environment are initially obstructed but can then later be observed if the robot moves slightly forward.Single viewpoints can be limiting, with regular occlusions, showing the need for a probabilistic method to fuse sensor readings.In image C, the robot was driving through tall grass.It correctly assessed the vegetation as traversable but had little information about the ground in front of it, as the grass obstructed almost all the environment except the patch near the robot.D) and E) show scenes further into the forest where dense foliage is less prominent, but thin and tall small tree stems are frequent and can be challenging to assess.Overall, our method assessed the environment accurately and qualitatively sensibly.The presented scenes aim to show additional examples in addition to the accompanying video.We note clutter and occlusions are frequent problems, making this a challenging unstructured environment and a probabilistic map helps to alleviate this.

V. DISCUSSION
ForestTrav demonstrates a novel, real-time method capable of accurate TE in densely-vegetated environments.It relies solely on a salient 3D probabilistic representation and leverages contextual information in a principled manner using SCNN, showing significantly increased TE accuracy compared to the SOTA.We also demonstrated that this method generalizes well, showing qualitatively comparable performance over data sets gathered in these environments.The feature ablation study suggests that using the distribution values directly avoids the need for additional computational effort for feature calculations, particularly the expensive eigenvalue decomposition.Additionally, we note the performance of the pure NDT-OM, using only occupancy distribution as features (F OCC ), is quite accurate, but less than the lidar-only method.These findings indicate that the SCNN can leverage contextual information from the environment to make accurate decisions but requires sufficient contextual information and resolution.TE using occupancy could be sufficient for many environments only containing small amounts of vegetation, making it easy to implement and deploy.Our proposed method is more suitable for higher vegetated environments, whilst the low computational requirements (model size ∼17 MB), training time (below 4 minutes), and the low amount of training data still would allow it to be broadly applicable.
The method shows limitations in cases where the robot moves too fast to gather enough information, the vegetation is too dense for lidar to penetrate or fails to discriminate large bushes or bramble.The method may be over-confident on the map's borders in its traversability prediction, which is common for deep learning methods.Further, our method currently cannot interpolate between missing data or perform any direct form of scene completion, possibly preventing access to an area if it cannot be sufficiently observed.This occurs due to occlusions from vegetation and is a major challenge in vegetated environments.Probabilistic maps mitigate this to a large degree and are superior to single viewpoint assessment due to the ability to fuse previously recorded data from multiple viewpoints.However, the approach will still fail if the density is too high.Since our method can maintain a large local representation of the environment, it can mitigate some of these issues by re-planning and finding less risky paths and better-observed areas.
The comparison against SOTA 2.5D methods places our method in the context of recent work deployed by two of the most successful teams in the DARPA SubT Challenge [1], [2].The comparison against CSIRO-TE, using a similar voxel representation, shows the need for salient features and an appropriate estimator to achieve sufficiently accurate performance for real-world deployment in these challenging conditions.On the other hand, the comparison against [18] highlights the issues that occur even with high resolution (4 cm) 2.5D maps, even when leveraging SOTA learning methods.Further, SOTA appearance methods commonly fuse visual appearance estimates with these 2.5D representations, e.g.[9].The data in our target environment suggests these 2.5D representations are insufficient for application in our proposed environments, regardless of the accuracy of the image-based modules.
Color and semantic information have been reported as powerful features for classification [3], [6].We found that including color features generally shows performance comparable to our lidar-only approach, but can degrade classification performance in some cases, as similarly reported by Bradly et al. [5].The lidar-color fusion used was extensively tested and deployed in [2], but environmental differences resulted in increased error.Potential sources are frequent camera occlusions, color distortion due to changing lighting conditions, or voxels containing elements of different colours.The difficulties in these environments require novel, robust solutions to leverage appearance information within a high-resolution voxel representation in a principled manner, beyond the scope of this work.
The novel analysis of the estimator's robustness to map quality provides an additional introspective tool.The combination of features from different representations is typical for learning-based TE methods.Each voxel, and also each of the statistical distributions (or representation) within a voxel, is also assumed to be independent for computational tractability.Hence, quantifying joint uncertainty or entropy of a mixture of the features/statistics of a single voxel is often infeasible due to these assumptions.Therefore, the presented evaluation of an estimator's robustness is a practical evaluation to assess the TE's performance for methods actively exploring previously unseen areas.

VI. CONCLUSION
In this paper we proposed ForestTrav: a real-time capable learning-based TE method effective in challenging dense and cluttered vegetated environments.We showed extensive quantitative and qualitative evaluations that indicate that ForestTrav significantly outperforms SOTA methods.Our approach leverages the environmental context of the SCNN and salient features of a 3D probabilistic voxel representation to generate an online capable system.In addition, it is trained solely on real-world data and is cost-effective in terms of training data and time.Through a comprehensive evaluation, we demonstrated the performance and generalization capabilities of our TE method, including a novel analysis to assess the TE in response to the map quality evolving as the robot explores/moves through and collects more sensor data.Further, we demonstrated that a pure ''geometric method'' is capable of accurately assessing pliable vegetation -a capability that has not previously been shown in the literature.
Current limitations around non-or partially-observed areas could be addressed by considering scene completion.Further investigation into the uncertainty quantification would allow assessment of which areas of the map are poorly observed and which elements would be out-of-distribution of the training data.This combination could enable targeted active learning, where the robot purposely interacts with uncertain areas in the map, encouraging learning in new environments and in areas with previously unseen samples.This would allow the agent to adapt to a novel environment online and in real-time, allowing for fast adaptation and avoiding costly labeling of data for novel environments.

FIGURE 1 .
FIGURE 1. Top: robot navigating in a forest with light to dense vegetation.Bottom: traversability estimation using ForestTrav.Lidar data are accumulated into a probabilistic 3D voxel map.The map is then passed to a sparse convolutional neural network to estimate per-voxel traversability.

FIGURE 2 .
FIGURE 2. Method overview (inference): During robot deployment a continuous stream of lidar measurements are continuously fused into a single probabilistic 3D voxel map, representing the environment with per-voxel statistics.A local feature map is generated to assess the traversability of a local area around the robot's current pose.Each of the N models independently classifies voxels as either traversable or non-traversable.The ensemble creates the traversability probability for each voxel by taking the mean of the N binary classifications.A sample scene is shown on the top left, with the robot in red.

Fig. 3
illustrates the offline training by splitting K postprocessed voxel maps into the training data set.Each map is split into equally sized 3D cubes.This process is illustrated for a single map, and each cube is visualized with a different color.All the cubes of all the K maps are combined into a single training data set.The test data is a separate map evaluated without splitting it.A detailed description is provided for the used data set (Sec.III-F), and the training details (Sec.III-E).

FIGURE 3 .
FIGURE 3. Training data generation:The training data is generated based on the post-processed map, using offline optimized poses.The hand-labeled data set is fused with the robot experience.The fused post-processed traversability map is split into smaller cubes suitable for training our method.This is repeated for each scene and added to the reference database, containing all data for training.The exception is the separate test set, where the scene is not split into smaller cubes.Details of each step are provided in Secs.C-F.

FIGURE 4 .
FIGURE 4. Left: Illustration of instances of a robot traversing or colliding with environmental elements.The red bounds indicate the voxels that may cause collisions, green boxes are voxels that the robot successfully traversed.Right: The probabilistic collision map from the robot experience only.Dark purple is non-traversable, yellow traversable and green uncertain.The red arrows are discrete poses of the trajectory.The map correctly captures the voxels mostly likely to be responsible for the collisions (tree trunk) and the adjacent traversable grass, without any discretization effects.

FIGURE 5 .
FIGURE 5. Overview of the U-Net architecture used in this work showing the number of channels, kernel sizes, strides, and skip connections.

FIGURE 7 .
FIGURE 7. Left column: A) RGB voxel cloud and B) ground truth labels of the test scene.Second column: C) traversability probability of NDT-TM and D) classification results compared to the ground truth.Green is true positive (TP), blue is true negative (TN), orange is false negative (FN) and red is false positive (FP).The third and fourth columns show the same for FTM [12] (E, F) and ForestTrav(ours) (G,H).

FIGURE 8 .
FIGURE 8.The comparison of methods and performance over vegetation density (solid).The number of samples used for each MCC score (dashed).

FIGURE 9 .
FIGURE 9. Qualitative comparison of 2.5D representation to our method.The robot model is shown in red.The color indicates TE, yellow (traversable), and blue (non-traversable).Each column shows a different scene.The top row shows the first-person camera view from the robot in the scene, the middle row ForestTrav, and the bottom row EM-CuPy.Left:Area with trees and many thin, small tree stems (traversable) small spaces.ForestTrav correctly assesses them.EM-CuPy wrongly assesses them as non-traversable, and high height variation (spikes) due to many different height measurements.Right: Overhanging branches at different heights.ForestTrav can represent the environment in full 3D and assess it correctly.EM-CuPy closes off the path due to overhanging obstacles due to limitations of 2.5D representations.

FIGURE 10 .
FIGURE 10.Classification performance of different methods for different times t in an unobserved environment.The brown dashed line shows the association percentage between the temporally evolving probabilistic map M(t ) and post-processed ground-truth map M Post , a proxy for the amount of the scene explored.All methods were tuned to the best of our capabilities.

FIGURE 11 .
FIGURE 11.Qualitative examples of online TE of a trajectory in a novel, unknown environment.The top left image shows a bird's eye view of the trajectory with letters indicating the location of the other scenes.Scenes A) -E) show the robot model (red) with the robot's current camera view (bottom left) and the voxel-based traversability probability results, colored as in previous scenes.The images were captured from ForestTrav on real-world data in real time.For this experiment, a different sensor-pack, with the same sensor configuration but double the spin-rate was used, showing generalization to different instances of the same sensor.

TABLE 1 .
Overview of the data set.

TABLE 2 .
ForestTrav classification performance compared to the SOTA.

TABLE 4 .
Ablation study for different feature sets used.