Extracting Statistical Signatures of Geometry and Structure in 2D Occupancy Grid Maps for Global Localization

Global localization (or place recognition) is a method of finding the current location of a robot on a map generated by a mapping process, and it is an open field that has not yet been completely solved in the field of mobile robotics. Most existing approaches to global localization are based on extraction of interest point features and their descriptors whether from raw laser scans or 2D occupancy grid maps. In this paper, unlike most approaches, we propose a novel method of extracting a statistical signature of geometric and structural features from a submap. A boundary and free-space features can characterize a geometric shape, while a reflection symmetry can quantify a structural shape of the submap. Experiments using five pre-built map publicly available demonstrate that the proposed method outperforms the other state-of-the-art image-based methods by examining precision-recall curve especially when occupancy noise added to the submap is progressively increased.


I. INTRODUCTION
R ELIABLE positioning of mobile robots is essential for high-level tasks such as guidance, delivery, surveillance, and cleaning. This is referred as localization in mobile robotics literature. Localization includes local localization and global localization [1]. Local localization aims to compensate accumulated location errors based on previous position and sensor data during robot navigation. On the other hand, global localization can locate robots without prior knowledge of the location, allowing to solve the kidnapped robot problem. This is more significant and necessary than local localization in situations where the robots are likely to experience severe positioning errors.
There are many methods of global localization depending on the type of sensors used in map building. As a general process, features are extracted from maps, stored in a database, and then compared with sensor data obtained from a current robot position. Loop closure detection [2], map merging for distributed multi-robot SLAM [3], and place recognition [4]- [6] in mobile robotics also deal with problems similar to global localization.
Existing approaches to global localization for mobile robots can be divided into two categories. The first is a method of extracting interest point features (i.e., keypoints) from 2D laser scan data and expressing them as descriptors using surrounding information around the features [4], [7]. By using acquisition characteristics of the laser scan data, keypoints are extracted based on range, normal, curvature, and orthogonality of the scan data. Descriptors are used for direct matching between keypoints or to generate scan signatures in conjunction with a bag-of-words (BoW) framework [5]. However, when applying in cluttered environments or using a laser scanner with low angular resolution, this method may fail to extract distinctive keypoints from a single scan due to sensor noise or its sparseness.
The second method is to convert accumulated laser scans into a 2D occupancy grid map and extract keypoints from this image. The keypoints and their descriptors can be extracted using SIFT [8] and SURF [9] which are representative methods in the field of visual object recognition. This approach has the advantage of reusing many algorithms adopted in computer vision, but is prone to inconsistencies due to noise generated in the process of projecting the scan data onto the image.
In this paper, we propose a new method to extract statistics of occupied and unoccupied areas in submap images instead of extracting keypoints. The proposed method divides the submap into boundary space and free-space, and represents distribution characteristics of each space as histograms to generate submap signatures. These two histograms reveal the geometric information of the submap. In addition, a symmetry score is introduced to quantify the structural information of the submap, and is used as a weight when calculating similarity between submaps.
In summary, the contributions of this paper are (1) to quantify geometric distribution of boundary and free-space as statistical signatures rather than extracting keypoints, (2) to introduce reflection symmetry to describe structural information (overall shape of submaps), and (3) to integrate geometric and structural information of submaps explicitly to form as one metric used for submap matching.

II. RELATED WORK
Place recognition (i.e., Global localization) is a field that is being continuously studied in the field of mobile robotics. There are a method using an image itself through deep learning and a method using 2D/3D maps generated by several types of sensors [10]. Since this paper deals with place recognition through the feature analysis of 2D grid maps, this section mainly focuses on the method using 2D LiDAR sensors.
Bosse and Zlot [4] outlined first works on extracting keypoints from raw laser scans and defining the information around the keypoints as descriptors. They applied combination of two keypoint detectors and six descriptors to a large-scale SLAM problem, reporting that combination of the curvature cluster-based keypoint detector and the moment grid descriptor showed the best performance. Since the descriptor was generated using information within a relatively large radius (i.e., 9 m) centered on the keypoint, it can be seen that descriptors play a role in encoding the entire submap, not as a local descriptor.
Fast Laser Interest Region Transform (FLIRT), a multi-scale interest region operator for 2D range data, is proposed in [7]. FLIRT is a method of extracting keypoints based on a curve approximation of a laser scan and defining occupancy information in the polar region around the keypoint as a descriptor. This approach was used together with RANSAC to prove its effectiveness by applying it to solving loop closure and global localization in indoor and outdoor environments.
FLIRT has been further extended to geometrical FLIRT phrases (GFP) for practical applications to large-scale datasets [5]. The idea of this approach is based on BoW scheme with introducing a weak geometrical verification. Each keypoint from a raw laser scan is associated with a predefined word, and a histogram of word frequency is used as a signature of a place. Subsequently, for higher retrieval performance in outdoor environments with a lot of laser scan data, geometrical landmark relations (GLARE) was proposed [6], which describes spatial relations of landmarks as a scan signature unlike BoW method. A 2D histogram of relative orientations and distances of cooccurring landmarks is defined as a scan signature through a soft-voting scheme which can deal with sensor uncertainties and noisy range data. It showed a faster search performance than GFP by applying an approximate nearest neighbor search in the query process.
Another method is to convert a raw laser scan into a 2D image and extract keypoints from the image using existing computer vision techniques. Li and Olson [11] proposed a method to extract stable keypoints by rasterizing LiDAR data by projecting it into 2D Euclidean space, followed by applying a multi-scale Kanade-Tomasi detector, a variant of a Harris corner detector. This method is easy to apply to SLAM applications because it uses the covariance matrix that naturally accompanies the corner detector algorithm to calculate spatial uncertainty of keypoints. Experiments with 2D/3D datasets have demonstrated that the repeatability performance of the keypoint detector was superior to other methods.
Millane et al. [12] proposed a method of generating a signed distance function (SDF) map by applying distance transform after thresholding an occupancy probability grid of a submap. In this SDF map, points with high curvature were defined as keypoints, and their descriptors were constructed by adding an average SDF value to an orientation histogram of SIFT descriptors. Since the keypoint and descriptor extraction was performed on the SDF, it has the advantage of simultaneously capturing features of geometry of free and occupied space. This work has recently been extended to solve 3D localization [13].
A map merging method for multi-robot systems using spectrum analysis of a 2D grid map was proposed in [3]. For submaps created by different robots, orientations of the submaps were first compared using Hough spectra, and translations were obtained with additional X and Y spectrum through a projection to the x and y axes of the submap image. This approach has been proposed as a map merging for multi-robot systems, it can be also applied to solve global localization or loop closure.
Unlike the methods based on the extraction of keypoints and descriptors, the proposed method defines the geometric and structural distribution of whole submaps as statistical signatures. Since it does not rely on direct matching of keypoints extracted from submaps, it is insensitive to small noise or unexpected appearance of objects when constructing 2D grid maps. Moreover, the proposed method is compatible with any sensor capable of generating 2D grid maps (e.g., ultrasonic sensor, 2D/3D LiDAR, and 3D depth camera), and thus it is independent of sensor modality.

III. SUBMAP FEATURE EXTRACTION
Given a submap image, we extract three features: a boundary, a free-space, and a symmetry feature. The boundary and free-space features can describe a geometric shape of the submap, whereas the symmetry feature can quantify overall shape of the submap. Our method explicitly considers the geometric and structural information of the submap concurrently, thereby having the advantage of being robust against noise.
Boundary feature: A submap can also be viewed as an object with a specific shape. Therefore, we define a shape function used in shape-based image classification to describe a distribution of a boundary area. We express the distribution of distance between two boundary points as a histogram by modifying the D2 function [14] used for 3D shape matching to a 2D version. Advantages of the D2 function include computational simplicity, invariant to rotation and translation, and insensitive to small perturbations.
Free-space feature: Since a free-space in maps indicates an area where robots can actually be located, the distribution of the free-space along with that of the occupied area provides very useful information for robot localization [12]. We represent the distribution of the free-space as a histogram by defining free-space nodes and calculating their shortest path distance from each node to the other nodes. Even if there are static or dynamic obstacles that were not in the free-space at the time of map building, the shortest path distance does not change much, thus it is insensitive to small perturbations.
Symmetry feature: A typical indoor environment can be classified into three types: corridors, rooms, and lobbies [15]. These spaces exhibit considerable symmetry due to structural characteristics of indoor buildings, and in particular, a reflection symmetry [16] can be a good feature to describe the overall structure of a place. Therefore, we compute a reflection symmetry score to quantify the structural feature of submaps and use it as a weighting factor for calculating submap similarity.
This section describes the extraction of the above three features from submaps and calculation of a similarity score between two submaps (Fig. 1).

A. Submap Partitioning
Assume that we have created a complete 2D occupancy grid map M using an available sensor such as a 2D laser scanner ( Fig. 2(a)). To locate robots, we need to compare the most recently created submap with some part of M, so the process of dividing the M into submaps is necessary. We split it into submaps based on junction nodes extracted from M.
First, a binary map M B is created and then distance transform [17] is applied for free-space detection on M. Since the raw distance transformed image contains many free-space regions, a morphological operator (erosion), followed by an image thinning technique [18] are applied to make 1-pixel wide edge map M E depicting only the major free-space. Finally, possible junction nodes are selected by performing junction test [19] on all nonzero pixels in M E , and clustered using DBSCAN [20] (Fig. 2(b)). The global map is thus expressed as the sum of submap images with a fixed radius r around each junction node (Fig. 2(c)), M ∀i∈J M S i , where S i is a i-th partitioned submap, and J M is the refined set of junction nodes. The submap radius r is dependent on range sensing capability of a sensor used.

B. Boundary Feature
A boundary of a submap (i.e., distribution of occupied pixels) represents the overall outer shape of a place, which is the main characteristic that distinguishes the place. We  perform the following procedure to quantify the boundary information.
1) Submap Binarization: Each pixel value of a submap represents a degree of occupancy. The binary image S B i is obtained by performing inverse-binary thresholding (S B i (u, v) = 1, if S i (u, v) < T B , otherwise 0) to remove ambiguous pixels inside the submap and thus we only consider the clearly occupied pixels.
2) Thinning: During map building process, structures such as walls may be expressed as densely occupied areas due to sensor noise or mapping errors. Therefore, a thinning algorithm [22] is performed on S B i to generate a boundary edge map S E i with 1-pixel wide. 3) Uniform sampling of boundary points: Here, the boundary indicates a set of nonzero pixels in S E i . We can reduce the amount of computation by reducing the number of boundary points P B i = {p 1 , p 2 , . . . , p N B } after removing adjacent ones through uniform sampling (Fig. 3(a)). 4) Construction of Histogram [14]: For all pairs of boundary points in P B i , we calculate the pairwise Euclidean distance and construct a N B × N B symmetric matrix as where each element of {d mn |m < n} is accumulated into the corresponding bin dmn of a 1-D histogram (Fig. 3(b)): where H b i is a normalized vector denoting histogram of D B i , and h b i (k) indicates the number of elements in the k-th bin. The number of bin N h b = 2r ∆ b is dependent on map size 2r and fixed bin width ∆ b . Fig. 3 illustrates different submaps and their corresponding boundary histograms. The histogram of submap 3 and that of the other submaps are significantly different in the peak pattern. However, submap 24 and 39 exhibit similar pattern changes despite having different shapes. Therefore, there is a limit to distinguishing all submaps with only boundary distribution, and it has implications that not only boundary information but also utilization of free-space should be considered to improve submap matching performance.

C. Free-Space Feature
A free-space provides useful information about internal structures of a place [12]. We quantify a distribution of the free-space by performing the following steps.
1) Extraction of Free-Space Nodes: The process of finding free-space nodes is analogous to the submap partitioning process (Section III.A) in that it goes through the process of image binarization, distance transform, and thinning. However, in this case, uniformly sampled free-space nodes V F i = {v 1 , v 2 , . . . , v N F } instead of junction nodes are obtained from the 1-pixel wide edge map of the submap (Fig. 4(a)).
2) Calculation of Shortest Path Distance: We can easily find a shortest path from any v m to v n by constructing an undirected graph G(V, E) whose vertices and edges are freespace nodes and their connections. The undirected graph G can be represented by an adjacency matrix A = [a mn ], where each element a mn is the Euclidean distance between vertices (i.e., free-space nodes). If an occupied pixel exists on the straight path v m ↔ v n , then a mn has an infinite value. We use Dijkstra's algorithm [23] to find a shortest path distance.
3) Construction of Histogram: In a similar manner to the construction of the boundary histogram (Eq. 1), for all pairs of free-space nodes in V F i , we calculate the shortest path distance and construct a N F × N F symmetric matrix as where SP is a function that returns the shortest path distance between v m and v n . Each element of {l mn |m < n} is accumulated into the corresponding bin lmn ∆ f of a 1-D histogram (Fig. 4(b)): where H f i is a normalized vector denoting histogram of L F i . The bin width ∆ f is a constant as in the boundary histogram. We found empirically that the maximum value of l mn is bounded within four times of submap radius r, so the number of bin N h f becomes 4r ∆ f . Submap 24 and 39 were not distinguished well by the boundary histogram (Fig. 3), however, when the free-space  histogram was applied, there is a more pronounced difference as shown in Fig. 4(b). On the other hand, interestingly, histogram pattern changes of submap 3 and 24 are similar unlike the case of the boundary histogram. This implies that the free-space histogram has complementary relationship with the boundary histogram, and the performance of distinguishing submaps can be improved through the combination of the two histograms.
The robustness of the free-space histogram against noise was tested by randomly placing N f number of virtual obstacles (e.g., moving persons, temporarily located indoor structures such as tables or chairs) which did not exist during map building process (Fig. 5(a)). Although the distribution of freespace nodes alters due to the addition of virtual obstacles, this does not significantly affect the shortest path distance, thus leading to negligible changes in the histogram (Fig. 5(b)).

D. Reflection Symmetry Score
Both the boundary and free-space histogram described above quantify a geometric shape of a submap, whereas a symmetry quantifies a global structural shape. Symmetry types include rotation, reflection, translation, and glide reflection symmetry [16]. Among these symmetries, a reflection is the most common type of symmetry found in indoor environments. We obtain two reflection symmetry scores for each submap through the procedure below.
2) Finding Axes of Symmetry: Since the indoor structure of the building is mostly rectangular, two axes of reflection where N α nz and |N α nz | are the set of nonzero pixels in S Eα i and its number, respectively. A score s β i by the reflection symmetry axis β can be calculated in a similar way.

E. Submap Similarity
Once a query submap S q is generated during robot navigation, it is compared against all submaps to find a best matched submap in the database. The submap matching score s is calculated by aggregating the three features, leading to explicitly including geometric and structural information as follows where ω b and ω f are the weights of the boundary and freespace histogram, respectively, whose sum is 1.0. The function d (A, B) is Bhattacharyya distance [25] of two histograms. The reflection symmetry axes (α, β) are independent of each other, thus Ω can be denoted as where η is a scaling factor and σ is an adjusting factor that controls the effect of structural feature on the submap matching score s. The submap similarity ρ is the reciprocal of s, so it increases as the two submaps are similar.

A. Datasets and Baseline Methods
To validate the proposed method, we used five publicly available datasets for our experiments (Table I) [21]. All maps are provided as images which consist of corridors, rooms, and lobbies, varying in size from approximately 45×16m to 125×100m with a resolution of 0.05m per pixel (Fig. 7). In general, the number of submaps should increase as the size of the map increases, however, fewer submaps are extracted for killian composed of long corridors because the submap database was built based on junction node extraction.
Shape Context: The shape context is an algorithm that describes global shapes in binarized images, and is mainly used for image matching, classification, and retrieval. The number of angular bins and radial bins are the main parameters of the shape context, and in our experiments, the matching performance was measured with 24 and 12, respectively.
SIFT/SURF+BoW: SIFT and SURF are methods for extracting local features and descriptors in images and are widely used in computer vision and robotics fields. A BoW in conjunction with the local feature extractor is mainly used for image classification, search, and scene recognition. In this paper, parameters of the local feature extractor were set so that no more than 300 features were extracted for each submap, and a codebook size of the BoW was set to 1024.
SIFT/SURF+TF-IDF: TF-IDF uses word frequency and inverse document frequency to weight the importance of each feature, and is generally known to outperform the BoW. The parameter setting was the same as that of the above BoW method.
SDF: Submap matching is performed using RANSAC based on keypoints and descriptors extracted from the local SDF geometry. Up to 200 keypoints were extracted per submap.

B. Performance Evaluation
First, a query submap S q was obtained by cropping an image with radius r around randomly selected location (u, v) on a global map, followed by randomly rotated. In addition, to verify robustness against noise, here we added two different types of noise which are a boundary noise and a free-space noise. In this paper, two-stage noise levels were defined according to the number of noise elements (N b , N f ), where N b and N f denote the number of the boundary noise and that of the free-space noise, respectively. Fig. 8 illustrates the query submapS q where noise is injected. The boundary noise is distributed around the randomly selected N b number of occupied pixels, simulating sensor noise or mapping errors. The free-space noise is placed on randomly selected N f number of unoccupied pixels in the form of a circle with radius r n or a square with width 2r n , denoting moving obstacles or temporarily installed internal structures.
The query submapS q was compared to all submaps in the database to find the best match based on the submap similarity ρ. If the difference between the location of the query submap and that of the submap with the highest similarity is within 1.0 m, then this match is determined to be a true match; otherwise, it is a false match. We performed 2,000 queries per dataset with parameters listed in Table II to construct precision-recall curves by varying the submap similarity threshold. Based on this, we quantitatively evaluated the performance of the proposed method by examining the area under precision-recall curve (AUPRC), recall at a precision of 0.9 (RP9), and F1 score.
The proposed method, SIFT+TF-IDF, and SDF showed reasonable performance in all datasets from the experiments in which only simple random rotation was added without noise injecting into the query submap ( Fig. 9, Table III). However, the performance superiority of the proposed method to the other methods was remarkable as the noise level increased. To sum up, when based on the three measurements of AUPRC, RP9, and F1 score, the proposed method compared to SIFT+TF-IDF which had best performance among baseline methods in the experiment of noise level 2 demonstrated the average performance improvement of 10.3%, 104.7%, and 6.3%, respectively.
Overall, the performance based on the fr079 and intel datasets was better than that based on the other datasets, which is due to relatively high proportion of distinct regions on the map (Fig. 7). In particular, in case of the fr079 and intel, the map is composed of rooms of various sizes, whereas most of areas of the killian dataset consists of long corridors, leading to inherently include many duplicated submaps in the submap database. Therefore, the performance based on the killian dataset was lowest among all datasets.
A maximum sensing range of a sensor used in localization process affects the determination of the submap radius r. A visible range of general indoor environments except for special environments such as long corridors or very large halls can  Weight of boundary histogram 0.9 ω f Weight of free-space histogram 0.1 η Scaling factor of reflection symmetry score 0.08 σ Adjusting factor of reflection symmetry score 120 be limited to within 10 m, so the experiments were performed with three submap sizes up to a maximum radius of 10 m (Fig.  10, Table. IV). As a result, we found that there were gradual performance improvements in all datasets without exceptions because distinct regions were more likely to be included as the submap radius increased.

V. CONCLUSIONS
The key contribution of this paper is the idea of extracting statistical signatures rather than keypoints to describe geometric and structural shape of submaps. In particular, by dividing a submap into a boundary and a free-space using existing computer vision algorithms, statistical values of each area were explicitly extracted. Moreover, reflection symmetry, which is often seen in building structures, was quantified in order to capture structural information of a submap. We explicitly integrated these three features to define a similarity score between two submaps. Our comparative experiments with the state-of-the-art image-based methods demonstrated that the proposed method was fairly effective at distinguishing between submaps especially when severe noise is injected to submaps.     Fig. 10. Precision-recall performance comparison with three submap radii. As the submap radius increases, the performance of the proposed method also increases. Overall, we achieved an average performance improvement of 10.3% (AUPRC), 104.7% (RP9), and 6.3% (F1 score), respectively, in experiments using five pre-built map publicly available.