Browse

• Abstract

SECTION I

## INTRODUCTION

FootSLAM is an algorithm that apparently challenges a well-established conjecture in navigation: “Navigation based on inertial sensors as sole means is subject to unbounded growth of position error over time.” This unbounded growth of position error has very practical implications, especially in mass market pedestrian navigation, where straightforward dead reckoning with low-cost inertial sensors results in hundreds of meters of error after a few minutes. If the navigation makes explicit use of the human legged locomotion by detecting steps, inertial sensors can be recalibrated during the rest phases of the foot, which results in significant reduction of the error growth to say a few meters after minutes. Since the error is still a random walk, even if this technique is applied, it typically results in hundreds of meters of error after few hours. In contrast, we have shown in experiments that FootSLAM is able to maintain an error in the range of one meter in typical indoor environments over extended periods of time without noticeable growth of error. As we will explain in the course of this paper, FootSLAM actually does not violate the conjecture stated above. While the inertial sensors are the “sole (technical) means,” FootSLAM effectively augments the inertial sensors with other sensors, namely the human's perception. In dead reckoning, a current position estimate is computed from the previous estimate and the estimated relative movement (odometry). Given an ideal inertial sensor, with which the relative movement can be measured without error, the position can be determined for all times with the accuracy of the initial position. For nonideal sensors the errors are cumulative and the growth in uncertainty is unbounded over time. FootSLAM builds on prior work on pedestrian positioning using foot-mounted inertial measurement units (IMUs) and the simultaneous localization and mapping (SLAM) approach pioneered in robotics [1], [2]. The novelty is that FootSLAM uses no visual or other exteroceptive sensors of any kind (the only sensors are the accelerometers and gyroscopes of a foot-mounted IMU) and still is able to prevent unbounded error growth. While this is still an unproven conjecture from a theoretical standpoint, we were able to show in our experiments that a pedestrian's location and the building layout can be jointly estimated by using the pedestrian's IMU-based odometry alone.

Fig. 1. Track of IMU-based odometry (blue) after applying ZUPT pseudomeasurements compared to ground truth (red) and temporal evolution of the cumulative angular error of the odometry. At $t \approx$ 870 s the angular error starts to grow with approximately constant rate until $t\approx$ 1000 s. Data from data set 16 of [12], with AM1T3ND ZUPT. (a) Track. (b) Cumulative angular error.

### A. Motivation

Moving as a pedestrian constitutes the most ubiquitous element of transportation in human society. The efficient routing of pedestrians in public transportation hubs is a prerequisite for tightly coupled and efficient intermodal transportation. Personal navigation aids require location and map accuracy in the order of the physical extension of relevant structures (e.g., doors, stairs, aisles) in the environment which is typically around one meter for indoor environments. Hence, the motivation for our work is to enable location determination and mapping for pedestrians in these environments where other means of location determination, such as satellite navigation or other radio-based localization techniques or mapping information are often not available or too inaccurate. While this has been achieved for robotic platforms in the past, it has been an unsolved problem for pedestrians primarily for one reason: exteroceptive sensors, such as cameras or light detection and ranging (LIDARs) are standard for robots, but problematic for pedestrian applications due to their mounting requirements, cost, and privacy concerns. Simultaneous localization and mapping for pedestrians, based solely on IMU data, would remove significant hurdles for mass market and professional applications, by eliminating the need for exteroceptive sensors and reducing privacy issues.

Projecting the improvements in microelectromechanical system (MEMS)-based inertial sensors to the next ten or even 20 years, it is foreseeable that FootSLAM will no longer require the inertial sensors to be mounted on the foot. Instead, the sensors and processing power in a pocket-carried smartphone are likely to suffice to position the bearer with the same accuracy that FootSLAM achieves with foot-mounted sensors already today. In essence, users of ordinary smartphones would always know their position and simultaneously create a map of their environment, merely by walking around in it.

### B. Related Work

Automated dead reckoning has a long tradition in naval, aviation, and automotive position determination. Its application to automated determination of a pedestrian's position has been facilitated only recently by the advent of highly integrated MEMS-based IMUs. Specifically, its application in mass market applications has become commercially prospective by the development of low-cost silicon turn rate sensors [3]. In order to cope with these sensors' errors in terms of time-varying biases and scale factors, Dissanayake et al.employed movement constraints based on a vehicle model within a Bayesian filter [4]. While their application domain was land vehicles, they effectively laid the foundation for online estimation of IMU sensor errors by using movement constraints which would later become the central pillar of inertial pedestrian position determination. Early work on dead reckoning for pedestrians relied on simple step detection and assumed either a fixed step length or a correlation of step frequency and step length. While these methods do not employ a strapdown algorithm to solve for a full 3-D displacement vector of the foot, they are fairly robust and, typically in conjunction with an electronic compass, work also for sensors mounted on the hip or upper torso. Randell et al.improved the estimation of the step length by measuring the peak acceleration of the foot in each step and using it as additional input [5]. The shortcoming of these methods is their inability to correctly determine nontrivial steps, such as sidestepping. Foxlin recognized the possibility to detect and use the duration of the foot's rest phase for injecting zero-velocity updates (ZUPTs) as pseudomeasurements into a filter that models the IMU's errors [6] (Fig. 1). Closely related to utilizing a platform's dynamic constraints, which may arise from its inertia or limited power and are typically constant over time and location, is the idea to exploit constraints of movement imposed by the environment. Such constraints may be caused by walls or other obstacles and are typically a function of the location. By using known building layouts, several groups achieved long-term stability in position determination of a pedestrian in 2-D [7], [8] and 2.5-dimensional [9] environments. Common to their approaches is the use of nonparametric sequential Monte Carlo filters (“particle filters”). Given the prerequisite of a sufficiently constraining environment and an accurate map, these methods achieve long-term error stability. A huge body of work exists in the robotics literature on the use of a wide range of sensors, such as sonar, laser ranging, and cameras to perform positioning of robots. The simultaneous localization and mapping (SLAM) problem was formulated to allow robots to navigate in a priori unknown environments [1]: a moving robot explores its environment and uses its sensor information to build a “map” of landmarks. Our work is closely related to the Rao-Blackwellized particle filtering approach employed in the FastSLAM algorithm [10]. Furthermore, we employ a probabilistic map that represents human motion in a 2-D hexagonal grid that is similar to an occupancy grid [11] but with a different purpose.

### C. Problem Statement

Existing approaches to infrastructure-less pedestrian position determination are either subject to unbounded growth of positioning error, or require one of the following two: 1) a priori map information, or 2) exteroceptive sensors, such as cameras or LIDARs with which traditional SLAM is being performed. In contrast, we wish to achieve long-term stability of pedestrian positioning, i.e., bounding the positioning error based solely on nonperfect odometry that exhibits both angular and distance errors.

### D. Structure of the Paper

After briefly discussing the motivation, related work, and our problem statement, we first present the Bayesian formulation of FootSLAM in Section II. The original FootSLAM algorithm and its extensions PlaceSLAM and FeetSLAM are outlined in Section III. Section IV describes experiments performed to validate the approach, and presents and discusses results from these experiments. The paper closes with conclusions and an outlook in Section V.

SECTION II

## BAYESIAN DERIVATION

### A. Dynamic Bayesian Network Representation

We will formulate the problem as a dynamic Bayesian network (DBN). The key is to suitably represent the actor, i.e., the pedestrian in the system. When a pedestrian walks in a constrained environment, he relies mainly on visual cues in order to avoid walls and other obstacles. The pedestrian might be walking toward a particular destination such as an office, or might just be walking randomly in the accessible space in an office during a conversation. In robotic SLAM, the robot's movement is controlled by a series of inputs $u(t)$. These inputs are then used in the SLAM estimation as inputs to a probabilistic motion model. For FootSLAM, we assume that the human visual and cognitive systems interpret the environment and use it to guide motion: observed physical constraints such as walls influence intentions which result in a person deciding which steps to take. Fig. 2(b) shows a DBN that models relevant aspects of the system. All random variables are denoted in bold face. The step transition vector ${\bf U}_{k}$ has a special property: given the old and new poses ${\bf P}_{k-1}$ and ${\bf P}_{k}$, the step transition ${\bf U}_{k}$ is determined entirely, since knowledge of any two of the state variables ${\bf P}_{k-1}$, ${\bf P}_{k}$, and ${\bf U}_{k}$ determines the third. Inspecting the DBN, we can now ask which random variables might be measurable or indirectly observable. It has been shown that observing human visual sensory input is possible, as reported in [13] and more recently for dynamic visual input in [14].1 However, we assume to have no means of directly observing the human visual system ${\bf Vis}$, nor can we directly measure where the person might actually want to go next $({\bf Int})$, even though step estimation by measuring electrical activity produced in skeletal muscles by electromyography (EMG) has been reported in [15].

Fig. 2. DBNs for classic (robotic) SLAM and FootSLAM showing three time slices, with random variables ${\bf M}$ (“map”), ${\bf Z}$ (“measurement”), ${\bf P}$ (“pose”), and ${\bf U}$ (“odometry”). The FootSLAM map implicitly encodes the environmental features that influence the pedestrian's visual impression $({\bf Vis})$ and intention $({\bf Int})$. ${\bf E}$ models the correlated errors of the step estimator. (a) Classic SLAM. (b) FootSLAM.
Fig. 3. Stylized trajectory of a pedestrian. Reporting of placestamps may be incomplete (e.g., “E” is missing after leg 9).

In FootSLAM, we make no assumptions as to how steps are measured, as long as the error processes can be modeled sufficiently. So far we have used inertial sensors, in particular, a foot-mounted IMU, that (differentially) measure the steps a person takes. For this case, step measurements ${\bf Z}_{k}^{U}$ are obtained in a manner as described in [8] and we assume that a suitable strapdown inertial navigation algorithm with a Kalman filter or similar algorithm is used. From the viewpoint of our DBN this estimate is a noisy measurement of the true step vector ${\bf U}_{k}$. The only other influence on the measurement ${\bf Z}_{k}^{U}$ is a state variable ${\bf E}_{k}$ that encodes the correlated errors of the step estimator.

### B. Sketch of Derivation

Our goal is to estimate the states and state histories of the DBN given the series of all observations ${\bf Z}_{1:k}^{U}$ from the foot-mounted IMU. More formally, we wish to compute the joint posterior $p({\bf P}_{0:k}{\bf U}_{0:k}{\bf E}_{0:k},{\bf M}\vert{\bf Z}_{1:k}^{U})$, which is factorizable TeX Source $$\displaylines{p\left({\bf P}_{0:k}{\bf U}_{0:k}{\bf E}_{0:k},{\bf M}\vert{\bf Z}_{1:k}^{U}\right)\hfill\cr\hfill=p({\bf M}\vert{\bf P}_{0:k})\cdot p\left(\{{\bf P}\ {\bf U}\ {\bf E}\}_{0:k}\vert{\bf Z}_{1:k}^{U}\right).\quad\hbox{(1)}}$$ The expression for the map probability in (1) simplifies because the assumed knowledge of ${\bf P}_{0:k}$ makes the map ${\bf M}$ conditionally independent of ${\bf U}_{0:k}$, ${\bf E}_{0:k}$, and the measurements ${\bf Z}_{1:k}^{U}$; as follows from the DBN in Fig. 2(b) and the deterministic relationship linking ${\bf P}_{k-1}$, ${\bf P}_{k}$, and ${\bf U}_{k}$. We will express the second factor in (1) recursively in the sense of a Bayesian filter. It can be easily shown that the recursive formulation is $p(\{{\bf P}\ {\bf U}\ {\bf E}\}_{0:k}\vert{\bf Z}_{1:k}^{U})$ TeX Source \eqalignno{\propto\qquad&\,p\left({\bf Z}_{k}^{U}\vert{\bf U}_{k}{\bf E}_{k}\right)\cdot p\left(\{{\bf PU}\}_{k}\vert\{{\bf P}\ {\bf U}\}_{0:k-1}\right) \cr&\cdot p({\bf E}_{k}\vert{\bf E}_{k-1})\cdot p\left(\{{\bf P}\ {\bf U}\ {\bf E}\}_{0:k-1}\vert{\bf Z}_{1:k-1}^{U}\right).&\hbox{(2)}} The recursion is usually begun with the pose ${\bf P}_{0}$ set to an arbitrary position and heading, since performing SLAM without any absolute heading or location information is invariant over rotation and translation. We assume pertinent postprocessing to resolve rotation, translation (and scale) transformations.

It is clear from the DBN that the map must play a role in determining the second factor of (2), the pose and step transition probability. Marginalizing over ${\bf M}$, we write this factor as TeX Source $$I\buildrel{\wedge}\over{=}\int\limits_{\bf M}p\left(\{{\bf PU}\}_{k}\vert{\bf P}_{k-1},{\bf M}\right)\cdot p({\bf M}\vert{\bf P}_{0:k-1})\,d{\bf M}.\eqno{\hbox{(3)}}$$ If we are able to compute $p({\bf M}\vert{\bf P}_{0:k-1})$ and the influence of a map on the transition from ${\bf P}_{k-1}$ to ${\bf P}_{k}$, we can perform sequential Bayesian estimation of the map and pose.

SECTION III

## ORIGINAL ALGORITHM AND EXTENSIONS

### A. FootSLAM

In order to obtain an online particle filter, we sample from a proposal density, such as the one described in [16], and correct the weights using importance sampling. Following [17] and [18], it can be shown that the weight of the $i$th particle $w_{k}^{i}\propto w_{k-1}^{i}\cdot I^{i}$ to a very good approximation. We therefore apply a Rao-Blackwellized particle filter [10] based on (1): each particle $i$ represents $\{\{{\bf PUE}\}_{k}^{i},p({\bf M}\vert{{\bf P}_{0:k}}^{i})\}$.

#### 1) Probabilistic Transition Map

We now introduce a probabilistic map, based on the probability of the pedestrian crossing transitions in a regular 2-D grid of adjacent hexagons of radius $r$. We choose hexagons because they are the polygons with the greatest number of edges that can be arranged to cover a 2-D area completely without overlap. Furthermore, six angular transitions appear an appropriate number of choices for which human motion would be reasonably independent between angles.

We restrict this space to the region visited by any particle and define $H_{h}$ as one of $N_{H}$ hexagons, where $h$ uniquely references a hexagon's position. We define the map ${\bf M}$ to be the set comprising all ${\bf M}_{h}$, where ${\bf M}_{h}$ is a vector of length 6 with each component denoting the transition probability TeX Source $${\bf M}_{h({\bf P}_{k-1})}^{e({\bf U}_{k})}\buildrel{\wedge}\over{=}P({\bf P}_{k}\in H_{j}\vert{\bf P}_{k-1}\in H_{h}), \quad {\hbox {where}}\ j\neq h\eqno{\hbox{(4)}}$$ for leaving the $h$th hexagon over the edge $e$ via ${\bf U}_{k}$ to an adjacent hexagon $H_{j}$. We assume that the map factors into local, conditionally independent components, and step ${\bf U}_{k}$ is only dependent on ${\bf P}_{k-1}$ and the local map ${\bf M}_{h({\bf P}_{k-1})}$. Writing $\mathtilde{h}$ for outgoing hexagon $h({\bf P}_{k-1})$, and $\mathtilde{e}$ for $e({\bf U}_{k})$, we compute the integral $I$ by integrating over ${\bf M}_{\mathtilde{h}}^{\mathtilde{e}}$ for the respective edge.

#### 2) Learning the Transition Map

Learning is straightforwardly based on Bayesian learning of probabilities of discrete random variables. Each time a particle makes a transition ${\bf P}_{k-1}^{i}\rightarrow{\bf P}_{k}^{i}$ across edge $\mathtilde{e}$ we count this transition in its local map of hexagon $H_{\mathtilde{h}}$. When computing the counts, we assume that observing a certain transition from an outgoing hexagon to an incoming one allows us to increment the counts for both the outgoing as well as the incoming one. This improves convergence and is the same as assuming that a person is likely to walk in either of the two directions. In order to incorporate prior information, we assume that $p({\bf M}_{\mathtilde{h}}^{\mathtilde{e}}\vert{\bf P}_{0:k}^{i})$ follows a beta distribution and integrating $I^{i}$ yields TeX Source $$I^{i}\propto{N_{\mathtilde{h}}^{\mathtilde{e}}+ \alpha_{\mathtilde{h}}^{\mathtilde{e}}\over N_{\mathtilde{h}}+\alpha_{\mathtilde{h}}}\eqno{\hbox{(5)}}$$ where $N_{\mathtilde{h}}^{\mathtilde{e}}$ is the number of times the $i$th particle crossed the transition, $N_{\mathtilde{h}}$ is the sum of the counts over all edges of the hexagon in this particle's map counters, and $\alpha_{\mathtilde{h}}^{\mathtilde{e}}$ and $\alpha_{\mathtilde{h}}=\sum_{e=0}^{5}\alpha_{\mathtilde{h}}^{e}$ are the prior counts. So far an increment of the time index $k$ is associated with a step that leads from one hexagon to an adjacent one. In reality, a step might keep the hypothesized pose in the hexagon or it might lead it over several hexagons. To address this we simply perform a weight update only when hypothesized pose has stepped out of the last hexagon and apply multiple products of (5) for all edges crossed. Similarly, we update the counts of all edges crossed.

### Algorithm 1: FootSLAM

1:    for $i=1 \rightarrow N_{p}$ do

2:       ${\hskip -3pt}{\bf P}_{0}^{i}\leftarrow(x,y,h=0)$, where $x$, $y$, $h$ denote the pose location and heading in 2-D

3:       ${\bf E}_{0}^{i}\leftarrow$ draw from initial distr. of odometry error states

4:    end for

5:    for all time steps do

6:       $k\leftarrow k+1$

7:        for $i=1\rightarrow N_{p}$ do

8:         draw the $i$th particle from the proposal density $p({\bf E}_{k}\vert{\bf E}_{k-1}^{i})\cdot p({\bf U}_{k}\vert{\bf Z}_{k}^{U},{\bf E}_{k}^{i})$ from left to right.

9:        ${\bf P}_{k}^{i}\leftarrow{\bf P}_{k-1}^{i}+{\bf U}_{k}^{i}$

10:        $w_{k}^{i}\leftarrow w_{k-1}^{i}\cdot I_{k}^{i}$

11:         where $I_{k}^{i}\propto\prod_{\forall\ {\rm edges}}({N_{\mathtilde{h}}^{\mathtilde{e}}}^{i}+\alpha_{\mathtilde{h}}^{\mathtilde{e}}/N_{\mathtilde{h}}^{i}+\alpha_{\mathtilde{h}})$

12:        for all edges crossed do

13:            for both hexagons joined by the edge $\mathtilde{e}^{\,i}$ do

14:                ${N_{\mathtilde{h}}^{\mathtilde{e}}}^{i}\leftarrow{N_{\mathtilde{h}}^{\mathtilde{e}}}^{i}+1$ w.r.t. the outgoing hexagon $\mathtilde{h}^{\,i}$

15:            end for

16:        end for

17:    end for

18:     normalize so that $\sum_{i=1}^{N_{p}}w^{i}=1$

19:    if resampling is required then

20:         perform resampling

21:    end if

22: end for

### B. PlaceSLAM

A straightforward extension of FootSLAM is the detection of physical proximity to reliably detectable “places” [19]. This detection hints may either stem from an assisting user or additional sensors, such as a radio-frequency identification (RFID)-tag reader or a camera that recognizes salient visual features. It is important to recognize that PlaceSLAM does not require continuous measurements, but can make use of sporadically incoming place detection incidents. A user may choose suitable places and hint their proximity when passing them. In subsequent walks, the person may signal when revisiting these places. Cues can be prominent items such as fire extinguishers or anything that would allow reliable and repeatable recognition of location.

Depending on the distinguishability of the hints or features, different variants of PlaceSLAM exist. In a trivial case, the true locations of the places are known to the system. In less trivial cases, the locations of placemarks are not known and the association “quality” of place identifiers may range from perfect a priori association to no a priori association at all. Fig. 3 shows a stylized trajectory of a pedestrian. Circles represent places. Letters and colors are identifiers. On the right side of the figure we see three possible placestamp sequences as input to the estimator: one for which the placestamps carry the unique letters, i.e., perfect association; one with partial association (colors); and finally, one in which only the fact that some place is seen is reported, i.e., unknown association. Note that we do not require uniqueness of the identifier. One can imagine a situation where a pedestrian signals every time he walks through any door. The requirement in this case is that such places are sufficiently separated in space. This is the most challenging and general case for PlaceSLAM.

Fig. 4. Inertial measurements of several partially overlapping walks are automatically combined by FeetSLAM, a collaborative form of FootSLAM, to estimate the map shown above. (a) MIT's Stata Center, a large academic building with many nonrectangular structures. (b) Maximum a posteriori (MAP) FeetSLAM map (cyan) resulting from several walks on one floor of Stata Center.

For each time step, the standard FootSLAM proposal step and weight update is performed for each particle. If no placestamp has been reported, we continue with the normal FootSLAM algorithm. If a placestamp was reported, we distinguish two cases. 1) If the particle position is separated wider than a predefined threshold distance $d_{\min}$ from all previously logged places in the particle's own place map, we assign a new unique identifier to this place. 2) Otherwise, we select the identifier of the place in the particle's place map closest to the particle's current position. In both cases, we then weight the particle with the product of the FootSLAM weight and the PlaceSLAM weight. In both cases, we subsequently update the location of the place according to its previous location distribution and to the location of the particle [19].

### C. FeetSLAM

If multiple data sets, possibly from multiple pedestrians, are combined, both coverage and accuracy can be increased. FeetSLAM, an extension to FootSLAM, combines multiple odometry data sets in an iterative process [20]. The result of each iteration enters the next iteration through (5) in the form of prior information. This approach is similar and in fact inspired by the iterative “Turbo” decoding principle applied in communication systems [21], which in turn is an instance of loopy belief propagation [22], [23]. Since the iterative process allows to combine the resulting data sets from walks of many different pedestrians with tractable complexity, FeetSLAM enables the computation of community-generated maps. In principle, these community-generated maps could form a map database with global coverage.

SECTION IV

## EXPERIMENTAL VALIDATION

In order to investigate the conjecture that FootSLAM is able to bound the growth of positioning error, we carried out initial experiments in an office building environment. A pedestrian was instrumented with a foot-mounted IMU and performed three walks, each of roughly 10-min duration, within this environment. We recorded ground truth for two positions at opposite corners of the main corridor by timestamping the event everytime the pedestrian passed these positions. All data sets were recorded and processed offline. A similar data set with additional placemarks has been recorded in the same environment. A detailed analysis of the results can be found in [19]. FeetSLAM has been tried in larger and more complex environments. Fig. 4 shows a map of an environment that included straight and curved corridors, nonrectangular crossings, and several loops [20].

### A. Discussion of Results

#### 1) Performance

A learned map (manually translated, rotated, and scaled) for hexagon radius $r=$ 0.5 m is shown in Fig. 5. In the visited areas, this map reflects the real path and is accurate to about 1–3 m, with better accuracy in the corridors that were frequented more often. The walk intentionally remained out of “loop closure” in the corridor for some time. The particles start to converge once the user backtracks or revisits a region for about 10 m. A sufficient number of hexagons have to be revisited once or twice for a usable map to emerge. This governs the required duration of a walk and fits with what a person typically covers in an office day. Accuracy in any case is related to the physical structure dimension, such as corridors and doors, which is about 1–2 m. The error evolution for FootSLAM is shown in Fig. 6. Our coordinate system origin was both the starting point and one of the reference points and we manually corrected for rotation ambiguity. With a sufficient number of particles we achieve an accuracy of approximately 2 m at the two reference points. Without FootSLAM, we see unbounded error growth after some time—our system coasted without too much error for about 300 s. Durations from 30 to 300 s are typical and suggest that without maps the particle filter can bridge areas like large halls where there are no features for FootSLAM to map. We expect FootSLAM to require a certain minimum average restrictedness of motion. Nevertheless, it can coast over some open areas given enough particles. To achieve accurate mapping, a relatively large number of particles (> 10 000) is necessary. Our current implementation runs roughly in real time for about 30 000 particles on a standard personal computer (PC). In our proposal function, we drew ${\bf E}_{k}^{i}$ from two independent random walk processes to model heading bias and heading rate bias error states [16]. The additive error in 2-D space between ${\bf Z}_{k}^{U}$ rotated according to ${\bf E}_{k}$, and ${\bf U}_{k}$, was white and Gaussian. In addition to SLAM-inherent rotation and translation invariance, FootSLAM is subject to a map scaling error. In FootSLAM, this error is due to biases in the IMU sensors, occasional erroneous ZUPTs, and subsampling/clipping of the IMU signals that affect the step length estimation, and also a result of particles exploring hypotheses of different lengths. In our quantitative evaluation and videos [24], we have not used an individual length correction factor. This was only done using a constant factor of 1.15, which we have established quite reliably for that sensor setup. For the illustrations in the known floor plans, we adjusted the scaling so that the resulting map fits the known floor plan. This additional scaling adjustment was less than 10%. Scaling can be automated when tracks are anchored to outdoors Global Positioning System (GPS) measurements or other absolute positioning systems.

Fig. 5. A map resulting from FootSLAM based on IMU data alone. Shown is an overlap of the posterior (i.e., weighed average) map in shades of gray and the maximum a posteriori (MAP) map (i.e., the map of the “best” particle) in black. Ovals highlight errors with arrows indicating roughly the error vector. The reference building layout is plotted for comparison.
Fig. 6. Relative position accuracy evaluation for FootSLAM—each curve is a single run on one data set.

#### 2) Privacy Implications

Given the current performance of FootSLAM and projecting the improvements in MEMS-based inertial sensors to the next ten, or even 20 years, a range of significant privacy implications for practically everybody arises, the solution of which is beyond the scope of this paper. We mention them for completeness and believe that they may require some form of regulation or even legislature in the coming years. Whoever has access to the stream of IMU data can determine the pedestrian's current position and the historic trajectory, as well as the layout of the environment. Since FootSLAM can position the user without any external signal or interaction with infrastructure, all sensing, computation, and user interaction can be carried out on the personal device, theoretically with full control over all data by the user. For this reason, we suggest that data from inertial sensors should be subject to the same access constraints as other location sensors such as GPS.

Perhaps the most unclear privacy or security threat that is caused by the existence of FootSLAM is virtually impossible to prevent or even detect. Covert mapping of environments such as private, industrial, or governmental premises is hard to counter or detect, since FootSLAMs rids the need for any technical exteroceptive sensor that could be found in a nondestructive inspection. On the other hand, FootSLAM may—in terms of privacy—be the acceptable compromise for indoor navigation on such premises, since the IMU data are most likely to be considered less sensitive than the images taken by a pedestrian-worn camera in visual SLAM approaches for pedestrian navigation.

SECTION V

## CONCLUSION AND OUTLOOK

As of today, FootSLAM is capable of achieving stable position determination and accurate map generation solely based on an IMU in environments without any a priori knowledge. Extensions such as PlaceSLAM and FeetSLAM have been introduced and have been shown to work in a wide range of realistic experiments.

We foresee several interesting avenues for further research that will extend this work. So far we have concentrated on feasibility without optimizing for computational complexity. We see the main potential of FootSLAM in its collaborative form (FeetSLAM). In order to operate FeetSLAM in an economically viable way on a global scale, and in a 3-D environment, reducing required memory and computational effort is desirable. This holds both for a centralized approach in which most of the computation is performed on large-scale server farms as well as in a decentralized approach in which data and computation are shared among mobile devices. Any improvement in the underlying odometry not only improves the accuracy but also reduces computational effort in FootSLAM. Since significant improvements with respect to the accuracy of MEMS-based inertial sensors are to be expected in the next ten years, the future of inertial-based pedestrian navigation holds promise. In the not too distant future, it should be possible to reduce the dependency on “naïve” ZUPTs, which require a foot-mounted IMU. This will allow for less restrictive mounting on the hip or upper body. Ideally, the user will be freed from any constraints on where to put the device, and future algorithms will be able to robustly estimate the sensor's trajectory, independently of whether it is attached to the belt or just loosely put into a pocket as part of an arbitrary mobile device. Should cold-atom interferometry [25], which has already been applied in proof of concepts for inertial sensors with up to six axes [26], one day lead to practical devices, localization will require very little additional information for estimating and correcting the remaining errors. A FootSLAM-like approach would likely suffice to provide this additional information and do so with very little computational effort. Apart from these technical issues, interesting questions arise concerning ownership of data: if Alice walks through Carol's home or company, is she entitled to use FootSLAM to generate a map, use it herself, share it with Bob, or even publish it? ■

### Acknowledgment

The authors would like to thank D. Rus, J. Leonard, and B. Julian of the Computer Science and Artificial Intelligence Laboratory (CSAIL), Massachusetts Institute of Technology (MIT), Cambridge, for very valuable theoretical input, discussions, and their support in obtaining the measurements in Stata Center.

## Footnotes

This work was supported by the European Community's FP7 Programme (FP7/2007–2013) under Grant 215098 of the “Persist” Collaborative Project and Grant 2574943 of the “Societies” Project.

The authors are with the Institute of Communications and Navigation, DLR Oberpfaffenhofen, 82234 Wessling, Germany (e-mail: Michael.Angermann@dlr.de; Patrick.Robertson@dlr.de).

1A tempting hypothetical experiment would be to record blood oxygen level-dependent (BOLD) signals of the occipitotemporal visual cortex, and then to investigate if Bayesian estimation of the location of the person or the map of the environment is possible. To validate this in absence of portable functional magnetic resonance imaging (fMRI) one could draw on video playback to a subject in a stationary fMRI device.

## References

No Data Available

## Cited By

No Data Available

None

## Multimedia

No Data Available
This paper appears in:
No Data Available
Issue Date:
No Data Available
On page(s):
No Data Available
ISSN:
None
INSPEC Accession Number:
None
Digital Object Identifier:
None
Date of Current Version:
No Data Available
Date of Original Publication:
No Data Available