Simultaneous Localization and Mapping for Inspection Robots in Water and Sewer Pipe Networks: A Review

At the present time, water and sewer pipe networks are predominantly inspected manually. In the near future, smart cities will perform intelligent autonomous monitoring of buried pipe networks, using teams of small robots. These robots, equipped with all necessary computational facilities and sensors (optical, acoustic, inertial, thermal, pressure and others) will be able to inspect pipes whilst navigating, self-localising and communicating information about the pipe condition and faults such as leaks or blockages to human operators for monitoring and decision support. The predominantly manual inspection of pipe networks will be replaced with teams of autonomous inspection robots that can operate for long periods of time over a large spatial scale. Reliable autonomous navigation and reporting of faults at this scale requires effective localization and mapping, which is the estimation of the robot’s position and its surrounding environment. This survey presents an overview of state-of-the-art works on robot simultaneous localization and mapping (SLAM) with a focus on water and sewer pipe networks. It considers various aspects of the SLAM problem in pipes, from the motivation, to the water industry requirements, modern SLAM methods, map-types and sensors suited to pipes. Future challenges such as robustness for long term robot operation in pipes are discussed, including how making use of prior knowledge, e.g. geographic information systems (GIS) can be used to build map estimates, and improve multi-robot SLAM in the pipe environment.


I. INTRODUCTION
Water is one of our most precious natural resources. Pipe networks transport water between sources and destinations, and similarly, sewer and drainage pipes transport waste products away from the customer to processing plants. Inspection and maintenance of water pipe networks [1] is crucial for maintaining a robust water supply and conserving the resource, and in the case of wastewater preventing contamination from leaking sewer pipes and removing blockages. In the UK, the buried pipe network for water and wastewater is around 0.8 million kilometres in combined length [2], whilst the The associate editor coordinating the review of this manuscript and approving it for publication was Saeid Nahavandi . USA has 1.2 million miles of water supply mains and a similar amount of sewer pipes [3]. Investment in water and waste infrastructure is correspondingly large: over £250 billion is invested in UK water infrastructure [4], whilst the USA Environmental Protection Agency estimates that $271 billion must be invested over the next 20 years for wastewater/storm-water upgrades and $384 billion for drinking water upgrades [5]. Failure in pipe networks, in terms of a pipe leak, burst or blockage, can lead to severe disruption, including loss of water supply and road closures whilst the damage is repaired. It is estimated that over 3000 million litres of water is lost to leaks every day in the UK [6], and about 900 billion gallons of untreated sewage is discharged into USA waterways each year [7]. Therefore, continuous inspection, monitoring of VOLUME 9, 2021 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ deterioration, and detection and localization of damage in water and sewer pipes is of utmost importance. Currently, there is no mobile robot technology in real world use in industry that can autonomously monitor pipes to detect and localise defects over a large spatial scale and a long time duration. Industry methods used to monitor the pipes tend to be manually operated, tethered systems, such as in-pipe closed-circuit television (CCTV) inspection for sewer pipes [8] and above-ground technologies including ground penetrating radar [9], [10] and electromagnetic location (EML) [11] for both water and sewer pipes (and other underground infrastructure) [12].
Robotic devices have been developed in industry for water pipe inspection. Sahara [13], which is a tethered device and Smartball [14] are some of the most popular ones. Smartball is a newer, untethered technology that relies on a free-flowing (rolling) inspection. However, these two robotic technologies tend to be limited to specific short sections of pipe for one-off inspections.
Previous surveys focus on inspection in water pipes [15], [16], sewer pipes [17], small diameter pipes [18], and briefly on mapping for underground pipes [19], but the main emphasis is on the types of sensors and surveying technologies depending on the types of pipelines. There is a gap for a focused survey on state-of-the-art robot localization and mapping in water and sewer pipe networks, which is addressed here.
Robot mapping and localization methods are usually based on simultaneous localization and mapping (SLAM), where raw sensor data is processed in a front-end for feature extraction, data association and loop closing, and robot location and map estimates are produced by a back-end optimization algorithm. SLAM is useful for both robot control and navigation, as well as in the context discussed here of mapping the pipe network and localizing defects. The SLAM problem and its classical solution methods are well reviewed in [20], [21] and more recent approaches and future challenges are presented in [22]. The aim of this paper is to survey the state of the art in SLAM in pipe networks. The focus is on water and sewer pipes but we also refer to other pipes (e.g. gas) where relevant work has been done.
The remainder of the paper is structured as follows. In section II we describe the particular challenges of the pipe environment for SLAM. In section III we describe the high-level requirements of the water industry and relate this to technical requirements for SLAM in pipes. In section IV we provide an overview of methods for SLAM. Section V reviews map representations and discusses their suitability for pipes. Section VI reviews different sensors used for SLAM in pipes. Future challenges are considered in section VII. Finally, conclusions are given in section VIII.

II. CHALLENGES FOR SLAM IN WATER AND SEWER PIPES NETWORKS
SLAM is fundamentally a difficult problem to solve because to estimate a map you need a good estimate of your location, and to estimate your location you need a good estimate of the map. Odometry methods can be used to estimate location but the estimate drifts over time. SLAM is particularly challenging therefore when exploring new areas, or areas that lack discriminative features used to recognise places to correct odometry drift. To function reliably, a SLAM system should be robust in all of its components -front-end sensing (where it is generally thought that the use of multiple sensor-types improves robustness), landmark recognition, and back-end optimization algorithms that produce the location and map estimates. These are general challenges for SLAM but there are a number of specific challenges for SLAM in water and sewer pipes that are described in this section (and summarised in Table 1).
Water and sewer pipes are typically buried underground, meaning that robots in these pipes cannot receive GPS signals to estimate their location. GPS is one of the most popular and standard methods of localization, and is also commonly used for drift correction of dead reckoning sensors such as those based on odometry and inertial sensing. The lack of GPS signals makes the problem of SLAM in pipes far more challenging than typical outdoor scenarios.
Water and sewer pipes are often relatively small in diameter, with the majority of water pipes worldwide in the region of 100-150 mm in diameter, and the majority of sewer pipes in the UK of 300 mm or less in diameter (see section III for more details). This limits the size of a robot, particularly its sensor payload, computational hardware and batteries. This means that a typical pipe robot is likely to be limited to a small number of sensors and that certain types of sensor might be unsuited to the environment, for instance those that are not readily miniaturised or consume large amounts of power.
The inside of water and sewer pipes are difficult environments to navigate through: they are dark, water-filled (always in water distribution pipes and with time-varying levels of waste-water in sewer pipes), with possible occlusions of sensors occurring in sewers from waste products. This particularly complicates SLAM based on vision, because a light source is needed. The possibility of the dirty environment fouling the sensors also raises the possibility of sensor failure, so robustness in sensing and navigation, and failure-aware robotics is a critical issue.
Mobility also presents a challenge because water pipes are pressurised and it may be difficult for a robot to move against the flow, whilst sewer pipes move flow down the gradient and consequently connections in manholes can often have significant drops in height at the inlet pipe, from the centimetre scale up to metres. So, for both water and sewer pipes it is easier to move in one direction (with flow, or down the gradient respectively), which can impact active SLAM methods, where the goal is to actively explore and map the environment.
An additional challenge is the lack of accurate maps of pipe networks. Whilst water utilities do usually possess pipe network maps, in the form of top down, 2D line drawings (see Fig. 1), they can often contain errors, due to: 1) discrepancies arising between planned replacement (upon which maps are usually based) and actual installation on-site, 2) loss of records (pipes can be tens of years old), 3) lack of precise record keeping. Errors in a map can be misleading to a SLAM system, potentially causing incorrect data associations, that can in turn affect a fragile back-end estimation algorithm causing the whole system to fail. Therefore, a key challenge is to how to incorporate prior map knowledge into the system in ways that will be both beneficial and robust.
SLAM algorithms usually rely on reducing uncertainty in the map and robot location by taking successive measurements of the same (static) map features from different positions. However, the robot movement in pipes is restricted to predominantly one dimensional (1D) movements along the pipe, whilst the pipe network itself exists in a three dimensional (3D) space. This restricted movement means that the same map features cannot be observed from many perspectives to help reduce map and localization uncertainty.
The inside of a pipe tends to be a feature sparse environment, lacking distinctive landmarks that can be reliably and repeatedly detected. This is a particular problem for sensors that observe the external world, like cameras and laser scanners, although sensors based on internal measurements of motion such as inertial measurements and wheel odometry are unaffected by this problem. For instance cameras are often used for feature-based visual odometry, where features common to successive image frames are used to estimate the robot pose. Insufficient features can cause such an algorithm to fail.
Loop closing is an important component of a SLAM system, which refers to recognising places when the robot returns to a previously visited location. Loop closure enables drift errors to be corrected and so improves the accuracy of a SLAM system. Loop closing is likely to be more challenging in pipes because it is a highly homogeneous environment with little variation in visual features, and similar types of structure and geometry repeated throughout the network, i.e. the cylindrical shape of pipes and the standard shapes of e.g. pipe joints, junctions and manholes. The pipe environment is therefore prone to perceptual aliasing, where different places in the environment generate a similar perceptual footprint. This is a challenge for SLAM in pipes because the robot might be prone to false positives in loop closure, where the robot mistakenly recognises a place and closes a loop, and false negatives, where the robot fails to recognise that it has returned to a previously visited location, preventing successful loop closures.
Finally, it is worth noting that amongst the various challenges for SLAM in pipes, there are also some advantages VOLUME 9, 2021   as well. For instance, it is typically the case that sewer pipes are laid in straight lines and that changes of direction occur at points such as manholes, which should help avoid drift in heading estimates. Also, manholes for sewers, and fire hydrants for water pipes, occur relatively frequently (approximately tens of metres apart) and can be accurately mapped from above-ground, which then provides known reference points when correctly recognised and data associated from within the pipe. Therefore, there are some aspects of the environment that can be exploited to simplify the SLAM problem.

III. WATER INDUSTRY REQUIREMENTS FOR PIPE NETWORK MAPPING AND DEFECT LOCALIZATION
In this section we consider water industry requirements and technical requirements for robot mapping and localization in pipe networks. Specifications quoted in the following section of high-level and low-level requirements are guided by published literature (discussed in the next sections) and interactions with key stakeholders through personal correspondence and knowledge sharing events. These stakeholders include water utilities' representatives and water industry technology companies, which in turn interact, and are guided by, other stakeholders such as the customers, regulatory bodies and the government.
The regulatory body for the UK, the Water Services Regulation Authority or OFWAT, has published requirements that lists resilience, and in particular, operational resilience, as a key requirement [23]. Operational resilience means reducing the probability of water supply interruptions and wastewater flooding, as well as mitigating the impact of any disruption through efficient handling, good communication and quick recovery.
The requirement of operational resilience in the water industry has many different contributing factors but the focus of this survey is primarily on two key aspects that robot mapping and localization can aid with: 1. pipe network mapping, so that the water utilities know where their assets are located and 2. defect localization, so that water utilities know where to target repairs, especially when this incurs the cost and disruption of excavating in the street.

A. PIPE NETWORK MAPPING
The locations of buried pipes are usually not fully known by the water and sewer companies responsible for managing pipe networks. This can be due to a number of reasons, such as the pipe locations not being recorded during installation, or the information not being recorded accurately, or the information being lost over time. Therefore pipe network mapping is an essential task for robot inspection systems so that water utilities know where their assets are located.
A pipe network map can be described using a few key variables: 1) pipes coordinates in 3 dimensions, i.e. X , Y , Z positions, 2) pipe diameter d, 3) pipe gradient, g.
Surveys for existing methods already have designated accuracy levels that in-pipe robots will have to compete with. For instance the British Standards Institution (BSI) has produced the publicly available specification (PAS) PAS 128 [12], for underground utility detection, verification and location, which has the accuracy levels specified in Table 2. Similarly, the American Society of Civil Engineers (ASCE) has produced the Standard Guideline for the Collection and Depiction of Existing Subsurface Utility Data [24], which gives four utility detection quality levels, shown in Table 3.

B. PIPE DEFECT LOCALIZATION
There are two main type of defects that need to be localised in pipe networks: 1) minor defects -knowing their location is important to monitor the defect over time, 2) major defects -finding their location is essential for the repair process -either conventionally by excavation or using trenchless technologies. Excavation size should be kept to a minimum to reduce costs (including reinstatement), minimise disruption and limit the potential to damage adjacent buried utilities. Discussions with water utilities suggest a sub-0.5 metre accuracy is desirable for locating defects, to guide excavation, which defines an accuracy requirement for a robot SLAM system. One key point though, emphasised by industry, is that there is a preference for excavating in the correct location at the first attempti.e. a single larger excavation would generally be preferred to excavating multiple smaller holes at incorrect locations. This means that the SLAM system should correctly characterise its location uncertainty via probabilistic methods. The time-scale of reporting defects and subsequent repair varies depending on the type of fault and the associated severity. Mapping the pipe network can be dealt with over a long time-scale (months or years) and will be continually ongoing. Blockages in sewer pipes and small leaks in water pipes can be dealt with also over a medium time-scale (weeksmonths). Bursts in water pipes need to be dealt with in a short time-scale (days).

C. SIZE, WEIGHT, POWER, AND COST (SWaP-C) REQUIREMENTS
The size, weight, power and cost (SWaP-C) requirements of robotic systems for water and sewer pipe networks are an important consideration, and pertain to mapping and localization with regard to a number of issues. Size restrictions for robots, and sensors in particular, in small diameter pipes, limit the type of technology that can be used for SLAM. Power is restricted because the mobile robots will be untethered, and will have to travel between charging stations whilst mapping a network; it is essential they do not lose power between these points and become lost in the network, creating a blockage and additional cost of recovery. The cost of robot solutions for inspection and maintenance needs to be competitive with existing manual solutions, for example a 2 person crew with a manually operated CCTV inspection system.
The size of water and sewer pipes varies greatly. The majority of water distribution pipe diameters across the world tend to be in the range of 100-150 mm (see Table 4 for more details) [25], whilst very large trunk mains can be metres wide. Sewer pipes also vary greatly in size, with around 70% of sewers in the UK having a diameter less than 300 mm, whilst only 9% have a diameter of 900mm or greater [26]. Therefore robots for either water distribution or sewer pipes will need to be relatively small to ensure coverage of a pipe network, or different sizes of robot will need developing. Small size robots will in turn necessitate small sensor payloads, batteries and computational hardware.
A crucial further detail is that the size of the entry point to the pipe network is a hard constraint on the robot size. In water distribution pipes fire hydrants provide natural access points, which would be attractive to use because they avoid costs associated with creating special access points for robots. New style through-bore hydrants in the UK have a pipe diameter of only 80mm, which gives an upper limit on robot size if these were to be used (although many older hydrants in the UK have sharp bends and restrictive valves which would make the insertion of a robot more difficult). Sewer pipes to a large extent avoid concerns over size for entry because they can be accessed by large manholes.
Power requirements relate mainly to actuators, sensors and computational hardware. Modern machine learning techniques that might be used in visual navigation algorithms, for instance based on deep learning [27], [28], might require relatively high power specialist computing devices (based on embedded general purpose graphics processing units GPGPUs). The power requirements, in terms of Watt-hours of operation, should also be considered in conjunction with the aim that these robots should ultimately perform long term inspection of the pipe network, over months and years. This means that the robots will require recharging, and hence there will need to be recharging points specially added to the pipe network. The question about power really becomes one of time and distance between charging, and the cost of infrastructure associated with installing charging points.

D. REQUIREMENTS SPECIFIC TO WATER DISTRIBUTION
Water is a unique commodity in pipes in that, unlike sewerage, oil, and gas (monitoring of pipes transporting the latter two has been reviewed elsewhere [29]), the water must be fit for human consumption. The use of robots must not degrade the safety of water, which creates two main issues in the robot design: 1) avoiding water contamination from introducing foreign bodies into the pipe network, VOLUME 9, 2021 FIGURE 2. Typical SLAM system. The sensors transmit raw data to a 'front-end', which processes the raw data, extracts features and performs data association. The front-end transmits the processed data to the 'back-end', which estimates the robot pose (robot location and orientation) and the map. The back-end typically uses a standard method such as an extended Kalman filter, particle filter or smoothing method to perform the estimation. The back-end can provide feedback to the front-end for loop closure detection. (Redrawn and modified from [22]).
2) avoiding dislodging material from the pipe wall that will appear in the customer's taps. Whilst these requirements pertain more to mechanical design of the robot and the robot insertion device they are worth noting here due to their importance.

IV. OVERVIEW OF SLAM
The SLAM problem is usually thought of in two parts: the front-end and the back-end (see Fig. 2). The front-end processes raw sensor data to extract features and perform data association, i.e. feature tracking in the short term as a robot moves around an object and loop closure in the long-term when a robot returns to, and recognises, a location previously visited. The back-end part estimates the robot pose (its location and orientation in 2D or 3D space) and the map, using the information extracted by the front-end.

A. PRELIMINARIES
The typical SLAM problem is formulated by representing the history of robot poses X 1:k , up to the current time-step k, together with the map m, as the joint probability distribution where x 0 is the initial robot pose, U 1:k is the history of control inputs (or odometry measurements) and Z 1:k is the set of all observed map landmarks, respectively The robot pose x k can be defined in 3D space as where (X k , Y k , Z k ) defines the location of the robot in 3D space in the world coordinate frame and (θ k , φ k , ψ k ) defines the orientation in terms of pitch, yaw, and roll, also in the world coordinate frame.
The map, m, can be specified by the spatial locations of recognisable features or landmarks detected in the world, where m = l 1 , . . . , l n m T (6) and l i = l i,x , l i,y , l i,z T is the location of the i th landmark in the world coordinate frame, and n m is the number of map features. The map can also be represented in a more dense form, e.g. as a grid map. We assume the map to be static throughout this paper, therefore m is not a function of the time-step k, in contrast to the robot pose. We assume that the robot state can be updated via where the f (.) function defines the state transition, u k is an input to the robot (or odometry measurement), and w k is state noise we assume to be Gaussian, zero-mean, white noise, with covariance Q k , i.e. w k ∼ N (0, Q k ). Next, we assume that an observation z k of a map feature can be related to the state via the measurement function h(.), where v k is observation noise that we assume to be Gaussian, zero-mean, white noise, with covariance R k , i.e. w k ∼ N (0, R k ).

B. THE SLAM FRONT-END: FEATURE EXTRACTION, DATA ASSOCIATION, AND LOOP CLOSING
A variety of sensors are used in robotics, which produce different types of raw data. For instance cameras produce pixel data, whilst laser scanners produce range and bearing data. The SLAM front-end performs feature extraction initially, processing these differing types of raw data into a measurement format z k (the features) that can be used directly in the back-end estimation process. The front-end also performs data association, where a feature in the map m is found which is most likely to be associated with the measurement z k , which allows us to write the measurement equation above in (8). Data association has two aspects: 1) short term data association, which handles the data association of consecutive sensor measurements; 2) long term data association, i.e. loop closing, where a robot recognises a place previously visited by associating current measurements to previously mapped landmarks. Data association can have two main types of error: 1) false positives, where there is an incorrect association made between an observation and the map -this can lead to catastrophic failure in the back-end algorithm; 2) false negatives, where an observation is rejected as spurious -this leads to reduced data for the backend, which can reduce the estimation accuracy but is arguably less serious than a false positive.

1) DATA ASSOCIATION METHODS
Data association can be performed using simple statistical validation gating, as used in target tracking [30]. The idea of a validation gate is that any previously mapped landmarks in the map have to fall within a region defined by the gate to be considered valid for association with the current measurement of a landmark. To clarify this point, consider a set of hypotheses H = {j 1 , . . . , j n m } each of which associates one measurement z i with one landmark l j i ; the measurement equation from (8) under hypothesis H is therefore where h H = (h 1j 1 , . . . , h mj m ) is the collection of independent measurement models and m H = l 1j 1 , . . . , l n m j m T is the vector of map landmarks corresponding to z H = (z 1 , . . . , z n m ) T . We can obtain a measure of distance between actual and predicted measurements under hypothesis H by using the Mahalanobis distance, wherex k is the estimated robot pose, S H = H H P H H T H + R is the innovation covariance, P H is the estimated covariance of robot pose and landmarks, and H H is the Jacobian associated with the measurement function h H . For a measurement/landmark pairing to be considered acceptable (or jointly compatible), the Mahalanobis distance D 2 H should lie within the validation gate based on the chi-squared distribution, where d = dim(h H ) is the dimension of the measurement function and α is the confidence level.
The gating method avoids unlikely data associations but a further problem is that multiple landmarks might fall into the gated region defined by (11), in which case there must be some additional method of data association. One of the simplest methods is to associate the measurement with the nearest mapped landmark within the gate, i.e. a nearest neighbour approach, known as individual compatibility nearest neighbour (ICNN) [31].
The early implementations of SLAM used the nearest neighbour approach, e.g. [32], however, this method is prone to error when used with more than just a few landmarks [21]. Later approaches were developed that considered data association in a more robust batch mode, such as joint compatibility branch and bound (JCBB) [31]. JCBB has also been extended to handling 2D lidar scans where the measurements points are numerous and correlated [33].
Multiple hypothesis methods have also been used in data association for SLAM since very early work [34], to modern systems that e.g. extend JCBB to MHJCBB (multiple hypothesis JCBB) [35]. These methods have the potential to be more robust but also tend to be much more computationally intensive.

2) LOOP CLOSING BASED ON APPEARANCE RECOGNITION
Loop closing is often based on appearance recognition. This can be done using various sensors such as cameras and laser scanners. The fast appearance-based mapping (FAB-MAP) algorithm [36] developed a visual appearance recognition algorithm using a bag-of-words (BOW) approach modified from speech recognition -the idea is to build a database of images stored as numerical vectors in the BOW space. To construct the BOW, visual features are extracted from images of places to create visual words, using methods such as SIFT [37] or SURF [38], then each place is represented by a histogram, which is the frequency of occurrences of each visual word in the image. This approach was made more efficient in FAB-MAP2 enabling loop closures for much larger environments [39]. Appearance-based recognition using the BOW method has also been extended and made more robust by adding a fast geometrical check to the image matching procedure [40].
FAB-MAP only uses a single image to perform appearance recognition, which can be sensitive to variation in appearance, due to e.g. changing light conditions, seasonal changes, viewpoint variations and dynamic objects. Therefore, SeqSLAM was developed [41], which uses a sequence of images to perform appearance recognition and tends to be a more robust approach. SeqSLAM uses a sum-of-absolute differences (SAD) to match image sequences between the recent observations and a database. SeqSLAM has received various updates, such as more efficient versions that avoid exhaustive search and instead use efficient tree searching with nearest neighbours [42]. A review of these types of appearance-based recognition methods such as FAB-MAP and SeqSLAM can be found in [43].
Deep learning methods have now also been applied to the problem of visual appearance-based recognition, where initial approaches used deep convolutional neural networks (DCNNs) pre-trained for image recognition [44], [45]: the idea is to use the inner layers of the DCNN as automatically generated features rather than the 'hand-crafted' features typically used in appearance recognition such as SIFT or SURF. The development of NetVLAD improved the use of the basic DCNN by taking the DCNN features and passing them through a vector of locally aggregated descriptors (VLAD) module, specifically designed for image retrieval [46]. Subsequently, [47] introduced a large-scale dataset for purpose-specific training of DCNNs for visual place recognition, which demonstrated improvements on just using re-purposed image recognition DCNNs. Recent work has demonstrated that both traditional image processing features, histogram of oriented gradients (HOG), and DCNN features can give robust performance when using image sequences and maintaining multiple hypotheses for matching [48]. To combine advantages across the methods mentioned above, SAD, HOG and DCNN features have been fused to give state of the art performance [49].
Range data, often from laser scanners or lidar (light detection and ranging), can also be used to perform loop closing from appearance recognition either in 2D [50], [51] or 3D [52], [53]. The use of range data overcomes the sensitivity of cameras to different lighting conditions. Similarly to visual appearance recognition, deep learning has also been applied in recent years to the problem of range based 3D lidar appearance recognition [54].

C. THE SLAM BACK-END: ROBOT POSE AND MAP ESTIMATION
SLAM back-end estimation algorithms to obtain the robot pose, x k , and map, m, fall broadly into one of two classes of algorithm: 1) Filter-based algorithms: these algorithms recursively estimate the current robot pose and map, i.e. they produce the estimatex k , and are usually formulated as a Bayesian filtering problem. The main approaches are based on the extended Kalman filter (EKF), (i.e. EKF-SLAM, as used in early pioneering work [55]- [57] and then latterly with more rigorous convergence analysis [58]), the sparse extended information filter (SEIF) [59], the particle filter (PF) (e.g. DP-SLAM [60]), and the Rao-Blackwellised particle filter (RBPF), (e.g. FastSLAM [61], [62] and variants [63]). 2) Smoother/optimization-based algorithms: these algorithms estimate the history of robot poses and map using all the data in a batch mode, i.e. they produce the estimateX 1:k , and are usually formulated as a sparse nonlinear least squares problem. The main approaches are graph-based methods, such as GraphSLAM [64], [65], smoothing and mapping (SAM) [66] and incremental smoothing and mapping (iSAM/iSAM2) [67], [68]. To expand on the pose/map estimation problem, firstly, we define the following maximum a posteriori (MAP) problem following from (1), (12) where assuming that the measurements and state predictions are independent and from using Bayes rule we can say that, (13) where η is a normalising constant. Note that to simplify the equations, we take the common assumption that the initial pose x 0 is known with full certainty -even if this is not the case, in an arbitrary coordinate system, x 0 can be taken to be at the origin and initialised to zero [69].
Substituting (7) and (8) into (13), which are both subject to Gaussian noise, and noting that maximising (13) is equivalent to minimising the negative log posterior, leads to the nonlinear weighted least squares problem, similar to that defined in [64] and [66], where a 2 P = a T Pa, and where the weighting matrices k = Q −1 k and k = R −1 k . The main differences in SLAM estimation algorithms lie in how this cost function J (X 1:k , m) is minimised. As noted above, the filter based approaches, EKF [56]- [58], SEIF [59], PF [60] and RBPF [61], [63] use recursive algorithms to produce the estimate of the current robot posex k . The smoothing/optimization algorithms, GraphSLAM [64], [65] and SAM [66]- [68], by contrast, operate on a batch of data using sparse least squares methods to produce the estimate of the entire state historyX 1:k , where the sparse structure in the SLAM problem is exploited to make the algorithms more computationally efficient. The sparse structure arises from the fact that each landmark is only observed from a small set of poses. A key advantage of the smoothing and optimization algorithms is that they intrinsically correct all previous robot poses when new loop closures are made.
Many of the SLAM algorithms (EKF, SEIF, GraphSLAM and SAM), except those based on the PF, use a linearised form of J (X 1:k , m) to simplify the estimation procedure: for the filtering algorithms this linearisation enables the propagation of a Gaussian distributed state estimate. For the smoothing and optimization algorithms the linearisation enables the use of efficient, sparse least squares methods. In each algorithm, the nonlinear state and observation functions are linearised using a truncated Taylor series expansion around a linearisation point of the state,x k , and the map,m, where F k and H k are respectively Jacobian matrices of partial derivatives of f and h with respect to the state, x k , whilst M k is the Jacobian matrix of partial derivatives of h with respect to the map, m. An important advantage of the smoothing and optimization algorithms, related to this linearisation, is that they are able to iterate the linearised problem until convergence, whereas the filtering algorithms (EKF and SEIF) only perform a single update, which means that the filtering methods can be more prone to linearisation error.
As a consequence of linearisation error the EKF can become inconsistent [69]- [71]: a filter is consistent if the estimation error sequence, x k −x k , is zero-mean and the state covariance, P k , matches the true covariance [72], where E [.] is the mathematical expectation operation. The inconsistency tends to arise in EKF-SLAM because the linearisation of f and h do not occur at the true state [71], and therefore the estimated state covariance, P k , becomes less than the true value, i.e. the filter becomes overconfident, leading to filter divergence [69], [71]. Smoothing and optimization methods tend to be less prone to this inconsistency because the estimates are computed in a batch and are iterated to convergence [73]. The PF and RBPF methods, by contrast, avoid the linearisation of f and h: particle filters are able to fully utilise the nonlinear model and represent the non-Gaussian state posterior using a numerical sampling approach. However, in SLAM where the number of map features can be large and tends to grow without bound, sampling methods can be computationally expensive. In FastSLAM [61], the RBPF is used to exploit a factorisation to make the problem more computationally tractable, where using the product rule we re-write (1) as where the key insight to note is that the map features m = l 1 , . . . , l n m T become independent when conditioned on the full robot trajectory X 0:k [61]. The robot trajectory posterior, p (X 1:k |Z 1:k , U 1:k , x 0 ), is estimated using a particle filter, with n p particles, where each particle represents a possible instance of the robot trajectory, and each particle uses n m EKFs to separately represent and update each landmark l i in the map. This means that each EKF is low-dimensional because they only represent one landmark each, hence the naive complexity of FastSLAM is O(n p n m ), i.e. linear in the number of map landmarks. By contrast, in EKF-SLAM the full covariance corresponding to all n m landmarks is used, meaning that updates are O(n 2 m ), i.e. EKF-SLAM updates have quadratic complexity. Hence, the FastSLAM algorithm is far more efficient than EKF-SLAM.
One of the problems with SLAM back-end optimization methods is that they can be sensitive to outliers arising from incorrect data associations and false-positive loop closure errors. These outliers can cause the whole SLAM system to fail because of the quadratic nature of the cost in (14), which gives undue influence to measurements with large residual errors. Therefore, robust loss functions that are less sensitive to outliers, such as the Huber loss, can be used in place of the standard quadratic loss [74]. This type of approach is used in early work in dynamic covariance scaling [75] and related methods such as switchable constraints [76], [77] and max mixtures [78], which use a tunable weighting to decrease the influence of inconsistent loop closures on the optimization. In recent work, adaptive kernels for robust cost functions have been developed [79]. There are also distinct methods that check for, and exclude, inconsistent loop closures from the optimization, such as realizing, reversing, recovering (RRR) [80].

D. VISUAL ODOMETRY AND VISUAL SLAM
Cameras have become a dominant sensor-type for robotics, owing to their versatility and usefulness across various tasks, for instance mapping and localization, structure from motion, obstacle avoidance, object detection and recognition, scene understanding and human-operator support. There is a corresponding wide and specialist literature on the use of cameras in SLAM, which therefore motivates its own section here. Cameras are also often used in visual odometry (VO) in robotics, which is the localization-only part of the SLAM problem, i.e. pose-estimation, without map-building [81]. The full visual SLAM (vSLAM) problem can be loosely defined as [82], vSLAM = VO + Global Map Optimization.
where the Global Map Optimization is typically done using loop closing and smoothing/optimization algorithms discussed above.
The vSLAM/VO problem can be solved using either stereo [83]- [88] or monocular (single) cameras [89], [90]. Monocular camera systems tend to be simpler than stereo systems but have the disadvantage that they do not intrinsically perform depth perception. Therefore, monocular systems generally require multiple overlapping views of an image, from distinct perspectives, to obtain depth perception algorithmically (i.e. by triangulation). Even with algorithmic depth perception, mononocular systems are still subject to scale ambiguity and scale drift.
VO algorithms are of particular interest in pipe robotics because reconstruction of the robot path from the pose, x k , intrinsically generates the pipe network map (because the robot moves within the pipe, therefore the robot pose x k defines the pipe location as well as the robot pose). Early monocular VO estimation algorithms tended to be based on filtering methods such as the EKF [89], [90], but keyframe optimization methods have become more dominant recently [88], [91]- [93]. Filtering methods marginalise FIGURE 3. An illustration of two visual SLAM algorithms, DSO and ORB-SLAM, applied to sewer pipe images. The leftmost two images illustrate DSO, with a frame of the processed video on the left, and an estimated map of the pipe from a video sequence on the right. The rightmost two images illustrate ORB-SLAM, with a frame of the processed video on the left, and an estimated map of the pipe from a video sequence on the right. out past poses and summarise historical information using a probability distribution. Keyframe optimization methods instead use efficient batch least squares algorithms to estimate the pose over a small number of keyframes selected from the recent frame history. One study concluded that keyframe optimization methods tend to be more accurate per unit of computing time [94]. A limitation of VO algorithms is that they do not perform loop closure in contrast to full vSLAM algorithms, so drift in pose estimates goes uncorrected.
Popular recent approaches to VO can be divided into feature-based methods, which use feature extraction to obtain image frame correspondences, such as ORB-SLAM [88], [93], and direct methods, which operate directly on pixel intensity, such as large scale direct SLAM (LSD-SLAM) [95] and direct sparse odometry (DSO) [92] (where DSO and ORB-SLAM are illustrated using our own implementations, not published, in Figure 3).
Another important class of VO method is based on deep learning, using convolutional neural networks [96], deep recurrent convolutional neural networks (DeepVO/ ESP-VO) [97], [98], unsupervised deep learning (UnDeepVO) [28], generative adversarial networks [99] and deep networks driven by optic flow [100]- [102]. In [103] a deep learning VO method was developed for underwater applications, which is relevant to water distribution pipes, and showed promise compared to standard methods (although the underwater environment did prove challenging for pose estimation). Full vSLAM has also been addressed using deep learning [104]. The deep learning methods appear to give competitive results to other VO methods on benchmark problems, and have the advantage that they are end-to-end so they do not require camera calibration, feature extraction and matching, and online optimization. They do, however, tend to require large amounts of training data, which may be problematic for sewer and water pipes.
VO is often fused with an inertial measurement unit (IMU), known as visual-inertial navigation systems (VINS), or visual inertial odometry (VIO) [105], [106]. Fusing VO with an IMU tends to improve accuracy, is low cost, and for monocular systems helps to resolve the scaling ambiguity. Recent popular VIO systems include MSCKF [107], ROVIO [108], VINS-Mono [109] and Vi-DSO [110]. MSCKF is termed a loosely coupled approach and is relatively simple to implement and computationally inexpensive (a Kalman filter fuses the VO pose estimate with the IMU, and potentially other sensors as well), whilst the others are tightly coupled (IMU data is included in the pose optimization), which tend to be more accurate [111]. The VIO problem, like many of the computer vision problems discussed in this review, has in recent years been addressed using deep learning such as in VINet [112].

E. LASER SCANNERS AND LIDAR FOR SLAM
Laser scanners and lidar (light detection and ranging) are one of the other major sensor-types, along with cameras, used in SLAM. Lidar sensors produce a scan of the environment that returns the range and bearing of nearby objects at discrete sample points -scans can be in 2D [113]- [115] or 3D [116]- [119]. The lidar SLAM problem is often divided, similarly to vision methods, into an odometry-type of problem using sequential scans for pose estimation, and separate map updating with loop closing [120].
A scan matching algorithm is typically used with lidar to estimate the pose of the robot -this is where a current scan is used with the previous scan of known pose (i.e. scan-toscan matching), to provide an estimate of the transformation between the two scans -this transformation can be used to update the pose of the robot. However, this is essentially a type of odometry method that will drift over time. Scan matching is usually based on iterative closest point (ICP) type algorithms [113], [121], which involves a minimisation of scan matching error as a function of the transformation between scans. Scan matching error can be minimised in terms of scan points [113], or extracted features such as lines [122].
Loop closing can be performed with lidar using scan-tomap matching [123]. This includes methods based on feature extraction, which reduces computational complexity [124], histogram-based matching [51] and machine learning [125]. Submaps can also be used in lidar scan-to-submap matching [115], [126], which improves computational efficiency and enables real-time loop closing.
Lidar can also be fused with vision to overcome problems associated with the different methods, i.e. visual SLAM relies on adequate visual features to function effectively, whilst lidar can be sensitive to rapid motion (because the point cloud can become distorted due to the robot motion interfering with the lidar scanning process) -fusion of vision and lidar can alleviate these problems [119], [127]. Vision-lidar fusion is reviewed in [128].

F. COMPARISON OF SLAM ALGORITHMS
Front-end methods for SLAM include data association and loop closing -these will have to be robust for the pipe environment because of the high likelihood of perceptual aliasing. There is a wealth of data association methods developed for mobile and manipulation robot SLAM, but data association for robot SLAM in pipes has not been addressed. For loop closing we would expect that vision-based appearance-based mapping on its own will be challenging, therefore the use of multi-sensor data fusion and prior map knowledge will be important to improve robustness.
The key advantages and disadvantages of each back-end SLAM algorithm lie in a number of factors. EKF methods are relatively simple, and tends to perform well in small to medium map problems but can be inconsistent leading to filter divergence. SEIF and FastSLAM improve on the computational efficiency of EKF-SLAM. The smoothing/optimization methods are advantageous over the filterbased methods because they treat all data in the estimation, which intrinsically leads to correction of older poses and map estimates when loops are closed, and are less prone to divergence. Hence, smoothing methods tend to be preferred in modern SLAM implementations.
SLAM in pipes will require computationally efficient solutions for relatively small robots with modest computational resources. The methods for SLAM discussed above can be extremely computationally intensive, even for the simplest methods, and especially for modern techniques aimed at robustness using multiple hypothesis methods in front-end data association [35], and back-end pose/map estimation using optimization/smoothing [129]. Therefore it is likely that SLAM in pipes will assume known maps for online localization and only perform SLAM intermittently.

V. MAPS FOR SLAM IN WATER AND SEWER PIPE NETWORKS
The choice of representation of the map of the robot's environment has implications for accuracy, precision, computational efficiency, and robustness of the SLAM system. In general, robots across different applications use a variety of descriptions of their surrounding environment, often depending on the application. In this section, the range of map representations used in the literature is described, and their usefulness for robots in pipe environments is evaluated.
There are also a number of auxiliary factors that will affect the choice of map for use for SLAM in pipes. For instance, the representation might depend to a certain extent on the locomotion and sensors used by the robot. The locomotion type can determine the space within which the robot can move, and therefore must be localised. A variety of types of robot locomotion have been developed for use in pipes, reviewed in [130], which includes flying or swimming through a pipe with six degrees of freedom, moving along the cylindrical surface of the pipe with fewer degrees of freedom, and moving along the axis of the pipe by pressing against opposite walls giving only one degree of freedom. Therefore, the dimension of the map representation might naturally coincide with the degrees of freedom in the robot movement.
It is also worth noting that maps used by water utilities of buried pipe networks, such as that shown in Figure 1, often exist to varying degrees of accuracy. However, this type of map cannot necessarily be used directly by a robot for localization, and conversely a map estimated by a SLAM algorithm would not necessarily be of a form that would be directly useful for a human operator in a water utility company. The remainder of this section will describe different types of map used in SLAM, as opposed to maps used by humans.

A. FEATURE MAPS
Feature-based maps could be made using features at a variety of scales, and could be made using features which are generally describable such as walls or doors, or using features which depend more on the sensing mode such as notable sets of pixels in a set of camera images. These two categories of feature-based maps are described here.

1) POINT FEATURES
Point features can be extracted from sensor data such as images, typically corresponding to significant points in the environment which might be recognised and distinguished from other points (see Fig. 4(a)). As noted elsewhere [92], features can be sparse or dense, and direct or indirect. In this section, methods that use some level of indirect representation of points in the environment will be described, distinguished from methods using direct sensor measurements such as the distance of reflection of a lidar beam or the intensity of a pixel in a camera image.
A number of solutions exist to this problem; the Harris corner detector [131], SIFT [37], SURF [38], FAST [132], BRIEF [133], and ORB [134], being some historically popular examples. These methods show an improvement in the solution over the last two decades, with a general emphasis on efficiency for real-time application. The solutions typically detect salient points in a camera image based on pixel intensity, and describe the point using the variation in intensity of the nearby pixels. This descriptor can be used to find matching points in other camera images, which can be used for localization and mapping, where the point features make up the map. Similar methods could be used to extract features from data from other sensors such as sonar and lidar.

2) LARGE FEATURES
A typical robot environment might be made up of walls, doors, furniture, and people, which can be represented as geometric features. However, the variety of possible features could be much larger depending on the application, and A three-dimensional grid would be needed for a cylindrical pipe, but would be hard to visualize. (c) A large feature map, where each feature is a cylindrical pipe, parameterized by its position, orientation, and shape. (d) A topological map of a large pipe network. Each node is a junction or manhole, and is connected to other nodes by pipes.
there would be considerable variation in features of each type suggested. Early work in SLAM used simple geometric representations such as planes, cylinders and corners [135]. Later SLAM algorithms used recognition of common features such as walls and doors [136]- [138]. An improvement to accuracy using these methods comes at the cost of reduced flexibility, and environments with unusually shaped walls and doors will be challenging for the algorithm. Knowledge of a pipe's cylindrical shape has been used for localization in pipes (Fig. 4c) [139].
Features such as walls, or pipes in this application, can be represented in a map in a parameterised form such as B-splines. This increases the flexibility of the map representation, and has been shown to be applied in an efficient SLAM algorithm [140], [141].

B. DENSE MAPS: GRID MAPS AND POINT CLOUDS 1) GRID MAPS
A continuous metric space can be decomposed into a grid of discrete, finite sized cells. In early work on this topic these cells were relatively large and typically corresponded to notable features in the environment [142], [143]. However, works using a discrete representation of the space diverged: some becoming known as topological maps, which continue to use a more coarse representation of the map, and some becoming metric grid localization which use a finer grid representation [144].
In an occupancy grid map (Fig. 4b) [145], [146], each cell has a probability of being occupied by an object, or being empty. The occupancy probability of each cell can be updated recursively using new sensor information. Grid mapping can be performed in either 2D or 3D [147]. A key limitation of grid maps is that they require large amounts of memory if mapping over a large spatial scale at high grid resolution. Memory efficient solutions to storing grids exist using trees for both 2D [148] and 3D maps [149], [150].
The grid representation gives flexibility to the representation of the probability distribution of the position of features in the map and the position of the robot. Where an occupancy grid approach is used, the features in the map do not need to be interpreted or undergo data association, which reduces the requirements of the front-end perception module. However, there is an inevitable loss in precision due to the discretisation of the map. In order to improve precision, grid cells need to be made small, however, as noted above this increases memory requirements.

2) POINT CLOUDS
A point cloud representation (Fig. 4) is a set of data points representing the position of features observed by the robot in the environment, typically with only position data and no further data. This might represent the observation of objects using a lidar laser scanner, or from a stereo camera. The small amount of information contained in each point means that point clouds can contain a relatively large number of individual data points. These data points are often processed as a group, so matches might be found between a new sensor scan and the existing cloud of points, for example. Point clouds have been used in a number of 3D SLAM algorithms [116], [151]. However, they do tend to be memory intensive and may not be well suited to small robots in pipes with limited memory capacity.

3) SURFACE REPRESENTATIONS
Dense surface representations acknowledge that the point features detected by cameras or lidar are part of a surface in 3D space. Surfel maps [152] (similarly named to pixel and voxel maps) have been applied to SLAM using depth cameras [153] and using 3D laser range data [154]. Truncated signed distance fields have also been applied to data from depth cameras [155].
These dense surface representations use more of the information in the sensor data, rather than extracting discrete features, which can give good performance even when sensing a surface with low texture, and the use of the surface as a concept (as opposed to point features in space) is easily applicable to the pipe environment which is made up largely of simple surfaces. However, as with point clouds, these representations are computationally expensive which is detrimental to the application to small robots in pipes.

C. TOPOLOGICAL MAPS
A topological map (Fig. 4d) is an alternative to a metric representation of the robot's environment and state. In this case, the map is described as a set of discrete places defined by their connectivity rather than necessarily their metric relationship. A detailed review on general topological SLAM is found in [156].
In typical mobile robot environments the environment might be discretized into a set of rooms, for example, however, the problem of discretizing the environment has a variety of possible solutions. This includes using gateways in the environment such as doors [157], using the meet points of lines equidistant from objects in the environment [158], using a square grid of cells of a fixed size [142], and separating places into nodes where the robot may turn and hallways which connect these nodes [159], however it is known that finding distinctive places in an environment can depend on the sensor system for a given robot [160]. Without much abstraction, a pipe network can accurately be described similarly to nodes and hallways [159], which gives a natural representation of the environment.
An advantage of a topological representation is the reduction in computational cost of localization and navigation in large environments [156]. Conversely, there is an inevitable loss of precision in the map representation, and as in many problems, a compromise must be made between cost and accuracy. Another advantage of a topological representation is that the general topological structure may be known even when precise metric information might not be available. This is especially applicable to the pipe environment, where the precise position of buried pipes may be unknown, but their connectivity can be assumed if the system is functioning as intended. Topological maps have been used in algorithms developed for localization in pipes [161], [162].

D. SEMANTIC MAPS
Semantic maps can be described as a map which contains both spatial information about the environment and classification of features, where further knowledge about these classes is available for reasoning [163]. They are therefore regarded as an enhanced map, with both geometric information and high-level qualitative features that have semantic meaning. Semantic SLAM methods are reviewed in [164].
Features with semantic meaning in pipes include: • pipe-joints, • customer connections, • fire hydrant connections, and • manholes These features are typical for clean and wastewater pipes and have been used as landmarks in pipe robots for many years; Infrared range sensors have been used for detecting inlets in pipes [165], laser scanners for detecting elbows [166] and T-junctions [167], and vision for detecting elbows and branches (junctions) [168], [169].
One potential advantage of using semantic maps in the pipes domain is that because existing maps held by water utilities label certain features such as manholes and valves, these features already have semantic meaning. Therefore, it is natural to include and exploit these semantic labels in prior maps of the pipe network which can then be used in semantic SLAM.

E. HYBRID MAPS
As noted above, topological maps are well suited to pipe networks and particularly the problems of path planning and exploration. Metric information is also needed, generally, to more precisely localise any defects encountered. Hybrid maps are well suited to this combination of needs, such as hybrid metric/topological SLAM [170]- [172], hybrid metric/semantic SLAM [173], [174] and hybrid semantic/topological [175]. These types of hybrid maps, combining topological and metric information, have been used successfully in mapping gas pipelines [176] and robust methods have been proposed for localization in sewer/water pipes [162].

F. COMPARISON OF MAPS
Each of the types of map described here have advantages and disadvantages in general robot localization. Typically there is a trade-off between precision and computation, and between flexibility and computation. In the application to robots in pipes compared to the general case, computation is limited significantly, flexibility is less necessary, and required precision can be variable, depending on whether the robot is trying to navigate or trying to precisely locate a fault. Therefore an effective map representation for the pipe environment will likely be a hybrid representation, using the environment topology to an extent for efficient path planning and exploration, and a metric representation to precisely map the network and locate faults.

VI. SENSORS FOR MAPPING AND LOCALIZATION IN WATER AND SEWER PIPES
Sensors are a key factor to consider when designing a SLAM system. The back-end algorithms tend to be common and interchangeable across domains, but the sensors need to be selected to suit the environment. This section reviews different sensors that have been used for robot navigation in pipes.

A. INERTIAL AND ODOMETRY DEAD RECKONING SENSORS WITH DRIFT CORRECTION
The most simple methods of in-pipe localization for robots have been based on dead-reckoning techniques, usually combining an inertial measurement unit (IMU) with some form of odometry [177], [178]. Odometry sensing has been used with tether systems [179], which limits the distance the robot is able to travel, and alternatively on-board wheel odometry for untethered robots [180], which is less restrictive. The accuracy of IMU-odometry localization systems has recently been analysed in [181], demonstrating that errors using a highgrade (quasi-tactical) IMU were not more than 0.25% and VOLUME 9, 2021 0.1% of the pipe lengths in the horizontal and vertical directions respectively (tested pipe lengths ranged from 100 m to 1700 m). Micro-electromechanical system (MEMS) IMUs have also been proposed for use in small diameter pipes, because of their reduced size, which might require improved algorithms to compensate for their lower accuracy (e.g. use of the cubature Kalman filter) [182].
Drift is the key problem with dead-reckoning sensors, and so dead-reckoning has also been combined with drift correction methods using detected landmark locations. The landmarks can either be naturally occurring, such as pipelength joints [180], [183]- [185] or deliberately added to the pipe network, such as GPS-located above ground reference stations [186], [187]. Pipe-length joints can be detected using magnetic flux leakage and electromagnetic acoustic transducers [180], [184], or from the vibration signal from an accelerometer as the robot passes through the joint [183].
The use of naturally occurring landmarks for drift correction, such as pipe-length joints, adds minimal cost to the localization solution (i.e. just the cost of the sensors), but is only applicable to scenarios where the landmarks have some a priori known position -e.g. this is the case in certain pipelines where each pipe length is known with certainty. This type of approach has been researched in water distribution pipes, combined with IMU dead-reckoning [185] and is promising because it is low-cost and provides the needed drift correction to dead reckoning methods. The disadvantage with deliberately adding reference landmarks to a pipe network is that it increases the cost and could become prohibitively expensive when considering the hundreds of thousands of kilometres of existing pipe networks.
Sensor motes for water distribution pipes have been proposed as an alternative to conventional robots, that could use an IMU plus additional sensing for drift correction in localization [188]. The sensor mote is passive and carried by flow through the pipe network. The key advantages of these simple sensor motes, over more sophisticated robots, is that they are likely to be more easily miniaturised, be cheaper, consume less power and be more robust. In the context of small diameter water pipe environments, and low overheads in the water utilities industry preventing uptake of expensive and complex technology, these benefits are appealing.

B. CAMERAS
Cameras are very often included on pipe inspection robots so that damage can be detected by visual inspection, such as in MAKRO [189], KANTARO [190], MRINSPECT [191], PipeTron [192] and EXPLORER [193]. Therefore, cameras and visual information about the surroundings are a natural choice to pursue for navigation. Early work used vision to estimate distance travelled along the pipe only, via an image mosaicking algorithm and a laser range finder for depth perception [194]. However, modern VO systems with keyframe optimization of the type described above tend to be used now [139], [195], [196] .
Fisheye cameras are most often used for VO in pipes because of the narrow structure of the environment [139], [196]- [199] (although a panoramic camera has also been tested [200]). In a pipe, when the camera is panned along the pipe axis, the distant surface of the pipe in the image suffers from projective deformations, whilst the near pipe surface appears in peripheral regions of the images clearly. The distant features all have a low parallax angle which degrade triangulation in VO. Hence, use of fisheye cameras tends to lead to more accurate VO estimates in pipes.
Stereo cameras have advantages over monocular systems because they have a fixed baseline between the two cameras, which aids triangulation, and they also automatically resolve scale ambiguity. In [195], two cameras were aligned facing upwards toward the pipe surface, which enabled a large amount of stereo image overlap for depth perception. By contrast, in [201], an axial-stereo vision system was proposed, where the cameras faced forward along the pipe [201]. Simpler, monocular systems have also been used in pipes, such as in [139], [196], [202]. In [139] structured lighting was used to recover the scale factor in monocular VO by minimising the reprojection error of two laser spots [139]. Distinct from both stereo and monocular systems are depth (RGB-D) cameras, which have the advantage that they can also be readily used to detect obstacles in the pipe [203]- [205].
Many VO algorithms for pipes leverage cylindrical information which exploits prior knowledge of the shape of the environment to improve pose estimation accuracy, and to help resolve scale ambiguity for monocular systems [196], [202], [206].
Robust methods have been developed that optimize the map points in the optimization by enforcing cylindrical regularity [139], [207]. Cylindrical regularity has also been used as prior knowledge for local pose optimization, to further improve accuracy [199]. One issue with using cylindrical information, however, is that it is necessary to detect features from the cylinder surface to match against a cylinder model. This has not been well researched because most papers assume clean, empty pipes -perhaps ideal for gas pipes, but this would not necessarily be the case in sewers (and even water pipes can have non-clean surfaces where biofilms accrue [208]). One possible solution to this problem is a method for detecting outliers proposed for 3D occupancy grid maps [209], but this is computationally intensive, so not necessarily well suited to real-time implementation.
Cameras can also be used in appearance-based SLAM methods [36], [43] and to recognise landmark features in pipes such as T-junctions, elbows etc. [168], [169], [176], [210]. In sewer pipes, VO has been combined with manhole recognition to correct drift, which is an appealing approach [203], [204]. Also, landmark recognition can be used to construct a topological pipe network map to enable the application of topological SLAM, and efficient topological path planning methods [211], [212].

C. LASER SCANNERS
Laser scanners have been used for many years for inspection in pipes [17], [213] and also for navigation in pipes including on the robots KANTARO [214] and MRINSPECT [215], [216].
The laser scanners on sewer robots have typically been used to recognise landmark features in sewer pipes, such as T-junctions, joints and elbows [167], [214], [216]. In KANTARO, the method of landmark detection is based on using the range pattern obtained from the laser to classify different types of landmark.
Laser scanners have also been used with cameras to improve the detection of landmarks -in one version of KANTARO, a stereo camera system first computes distance to a captured image and then the laser scanner is used to classify the landmark as a manhole or joint [217]. It was also noted in [217] that conventional laser scanners were too bulky at that time for sewer inspection and so they designed a custom laser-scanner more suited to the sewer pipe environment.

D. ACOUSTIC AND RADIO FREQUENCY EMITTER-RECEIVER SENSING
A number of emitter-receiver methods have emerged recently that seek to overcome the potential limitations of vision using acoustic and radio frequency sensing.
High frequency acoustic sensing (an ultrasonic sensor) [218], mid-frequency acoustic sensing (a hydrophone sensor) [219]- [221], and radio frequency (RF) sensing [222], [223] have all been applied to localization in pipes with the similarity that they all using an emitter-receiver approach to create a type of 1D spatial map along the pipe that is continuous, and hence more feature-rich than intermittent visual landmarks.
Low-frequency acoustic sensing (a speaker and microphone) [224], [225] has been used to make absolute measurements of the position of the robot in the pipe, relative to either a fixed source in [224], and using acoustic echoes in [225]. In either case, the absolute measurements of position have an advantage over visual odometry, which will drift over time.
The robot localization methods developed in [218]- [221], [225] have the advantage that the emitter-receiver unit is carried on-board the robot, whilst in contrast, the other acoustic [224] and RF [222] localization methods require the emitter to be placed in a fixed location in the pipe, with the receiver on-board the robot.
The ultrasonic sensing method developed in [218] is suited to plastic water pipes because it uses an ultrasonic transceiver to sense terrain profile through the plastic pipe wall. In contrast, the sensing methods developed in [219]- [221] use a hydrophone to excite pipe vibration in metal pipes. The initial work [219], [220] used an EKF and particle filtering methods for robot localization, whilst [221] developed a GraphSLAM method for localization.

E. ABOVE-GROUND SENSING METHODS INCLUDING GROUND PENETRATING RADAR
Above ground methods are often used for pipe detection and localization. This section briefly surveys some of these methods because they can form a useful part of an in-pipe robot SLAM system, by providing prior information and for fusing with in-pipe map estimates to improve asset mapping.
Ground penetrating radar (GPR) uses electromagnetic waves (typically in the MHz range) to detect and locate below ground targets [9]. GPR has long been proposed as a tool that can perform subsurface detection of buried utility infrastructure, including pipes [226]. More recent reviews on the use of GPR for locating underground utilities are given in [10] and [227]. GPR is often combined with additional sensors to more effectively locate targets, e.g. GPS [228], electric fields [229] and cameras [230], [231], and also multiple sensor suites such as GPR, GPS, lidar, cameras and encoders [232].
GPR data can also be combined with data from utility company records [233], although such data might not always be available or accurate. Various data fusion algorithms have also been developed to fuse GPR with other data sources based on Bayesian methods [234], [235] and Dempster-Schafer methods [236], where the latter avoid the need of the Bayesian methods to specify a probability distribution for each data source .
Additional above ground methods have been proposed for locating buried infrastructure, for instance in [237] where multiple above ground sensors are fused including GPR, Passive Magnetic Fields (PMF), Magnetic Gradiometer (MG), Low Frequency Electromagnetic Fields (LFEM) and Vibro-Acoustics (VA). Any of these techniques have potential for fusing with in-pipe robotics to improve detection accuracy.

F. COMPARISON OF SENSORS
The different types of sensor have various advantages and disadvantages for SLAM in pipes (summarised in Table 5). The basic inertial and odometry dead reckoning sensors provide simple localization but are subject to drift, so need to be used with landmark recognition methods to correct drift.
Cameras are one of the most widely used and mature localization and mapping methods in robotics, and have been used successfully in pipes. However, cameras do have certain disadvantages for use in pipes: the environment might lack visual features and so present problems of perceptual aliasing; vision-processing is usually computationally intensive so can require sophisticated and large computational hardware (which can be power intensive); the camera lens might become dirty and occluded by objects, particularly in sewer pipes; and the pipe environment is dark, so a light source is required (which consumes battery power).
Laser scanners/lidar are another popular type of sensor for SLAM, but have only been used, it would appear, for landmark recognition of e.g. manholes and elbow bends, not for pose estimation along the pipe lengths (this may be due to VOLUME 9, 2021  the fact that scan-to-scan pose estimation does not work well in pipe lengths because they tend to lack useful discriminative features for range-based pose estimation). Lidar is advantageous for pipes because it does not require a light source but has the drawbacks that the sensors tends to be more costly and bulky than cameras.
Acoustic sensing is appealing for pipes because acoustic waves tend to propagate well in this environment, and they do not require a light source. However, some acoustic methods require a fixed sound source, which limits mobility.
Above-ground methods are potentially useful where available but they would appear to be more complex to implement autonomously due to the fact they would need to operate in the unstructured above-ground environment.
Therefore, it would appear that multi-sensor data fusion is the most appealingly approach for sensor choice for SLAM in water and sewer pipes. Visual odometry, acoustic methods and inertial sensing appear well suited to localization along pipe lengths. Drift correction methods are critical for odometry-type methods, using landmarks such as manholes or potentially pipe joints if they occur at predictable locations. Vision and laser scanners appear well suited to the problem of landmark recognition at manholes, elbows and junctions. A publicly available dataset incorporating multiple sensors for localization in sewer tunnels is described in [238], including inertial, camera and laser sensors, which will enable the interested reader to investigate these sensors for themselves.

VII. FUTURE CHALLENGES A. SINGLE ROBOT SLAM IN PIPES
The methods discussed so far for SLAM in water and sewer pipes have mainly been tested in lab environments or small scale outdoor experiments. To bridge the gap to realworld use, a number of future challenges must be overcome. Cadena et al. [22], in one of the most recent and comprehensive reviews of modern SLAM, highlights the following areas as future general areas to address in SLAM: robust performance, high-level understanding, resource awareness, and task-driven inference. These are particularly true in the domain of pipes and aspects of these are considered below.

1) LONG TERM ROBUST AUTONOMOUS OPERATION AND SCALABILITY
SLAM for water and sewer pipe networks must be robust in the long term. It would be unacceptable for the water industry to take up a robotics inspection system that becomes lost in the pipe network, adding to the problem of faults rather than solving the problem. The review in Cadena et al. [22] concludes that long term robust SLAM is not currently possible, and much fundamental work remains to be done to solve this problem. Recent work has addressed developing more robust SLAM back-end optimization algorithms using multiple hypothesis methods in MH-iSAM2 (multiple hypothesis iSAM2) [129]. However, guaranteed fail-safe SLAM is still not a solved problem.
So the question remains on how to leverage current SLAM technology for robust operation in the domain of pipes in the near future. One practical way to address this problem is to modify the pipe network with locating beacons. This approach bears similarity to the idea of using above-ground reference stations mentioned above [186], [187] but modified for buried water and sewer pipes networks. There is a rising interest in developing smart monitoring for infrastructure, for instance smart pipes [239] and smart manholes [240], with RFID tags that communicate with the cloud, which can be used for autonomous pipeline monitoring [241], [242]. These devices could be used intermittently through a pipe network to provide recovery points for re-localization if the SLAM system fails. Such modifications of the pipe network might be costly, but this concern is alleviated if we consider that some amount of modification of the pipe network would be necessary anyway to provide communication hubs for transmitting inspection data.
Long term operation in pipes also raises the issue of scalability, where data storage and processing needs will continue to grow without bound. To address the problem of scalability, if we assume that pipe inspection robots are small, low-powered devices and that the pipe network map changes very slowly over time, an appealing strategy would be to only update the map at communication hubs by transmitting data to the cloud. This would mean robots perform localization-only whilst travelling the pipe network, in-between map updates whilst stationary at hubs, alleviating the computational burden of performing full SLAM during real-time navigation. This synergy between updating the map at communication hubs, possibly via the cloud, and robustly re-localising at hubs with full certainty would provide a natural solution to fail-safe, scalable SLAM in pipe networks. The challenge becomes one of minimising the cost of modifying the pipe network with smart communication hubs and ensuring sufficient coverage to guarantee fail-safe operation of the SLAM system in between returns to a hub. This implies that both the SLAM method and the smart pipe infrastructure should be co-designed and evaluated in a single complete framework in order to minimise costs and maximise robustness.

2) FAILURE-AWARE AND TRUSTWORTHY ROBOTICS
A major emerging issue across robotics and AI is the development of trustworthy solutions [243]- [248]. In the domain of water and sewer pipes, inspection robotics will provide key information on whether there is a fault, the type of fault, the severity, and also the location. In turn this will lead to human operators making decisions on whether to effect a repair and where to do it if so. Decisions to repair buried pipes will lead to excavations that are costly and disruptive; if the robot system makes errors this will severely impact the uptake of the technology (and potentially harm the uptake of robotics for many years).
Therefore, developing trustworthy robotic solutions for pipe networks is a key challenge for the future. This requires solutions able to work under different environmental changes and provide some characterisation of the level of trust in the solutions. It will be important to know under what conditions the approaches are fully trustful -for instance if a camera has become partially obscured, leading to inaccuracies in a visual SLAM system, the robot should be aware of this and be able to communicate the information. In general, the robot system should be failure-aware. This means, in the context of SLAM, that the robot should be able to intelligently detect failure of the SLAM system, if it occurs, and be able to communicate this to human operators.
Although currently a huge effort is focused on the development of reliable and trustworthy methods, the development of trustworthy solutions for intelligent robot systems, especially in pipe networks, is still lacking.

3) PATH PLANNING AND ACTIVE SLAM
The main tasks of a water/sewer pipe inspection robot are to detect and locate faults, and also, as outlined here, map the pipe network. In order to do this, the robot must explore the pipe network thoroughly to map it, and also continue to traverse the network in order to ensure complete and consistent coverage for long-term fault detection [249]. These primary goals of the robot link naturally to an active SLAM framework [250], which refers to the full problem of path planning in the context of the mission, as well as mapping and localization.
Active SLAM has been addressed using many of the main frameworks for back-end pose and map estimation including EKF-SLAM [250], particle filtering SLAM [251] and smoothing/optimization SLAM [252], [253]. The typical formulation of the active SLAM problem consists of three steps [251]: 1) The robot identifies possible targets to visit: frontier targets, from the boundary of the explored region of the robot's map, where the boundary can be identified by regions where the map certainty drops below a threshold; trajectory targets, where map certainty is high, which the robot can return to in order to reduce location uncertainty. 2) The robot predicts the expected information gain associated with the visiting a frontier target, along with the risk of incorrect trajectory approximation.
3) The robot travels to the target and then decides if it is necessary to continue or terminate the task. Active SLAM has not been addressed in the context of buried pipe networks but would appear to be an appealing strategy for this domain. One of the main challenges still to address is the prediction of the expected information gain [22], which can be computationally intensive and so not well suited to small, low-powered pipe robots. Therefore, computationally efficient solutions need to be found for this problem. In addition, active SLAM in water/sewer pipes will be complicated by the fact that it is generally easier and more energy efficient to move in the direction of flow, which might nuance solutions for this environment.

B. PRIOR MAP GENERATION FOR SLAM IN WATER AND SEWER PIPES
SLAM methods are often developed with the assumption that there is zero prior knowledge of the environment. However, this is far from the case for water and sewer pipe networks. Prior information of pipe networks maps can come from a variety of sources. A challenge for the future is to leverage VOLUME 9, 2021 these data sources, fuse them together and synthesise prior maps for robot navigation and SLAM.
Geographical information systems (GIS) have been used for many years in the water industry to capture, store, manipulate, analyze, and display spatial information for water and sewer pipe networks [254]- [256]. GIS pertaining to water/sewer pipe networks tend to include pipe locations and access points such as manholes and fire hydrants. Manhole recognition from inside the sewer pipe, combined with GIS map data, has been used to correct drift in robot localization [204]. However, recorded locations of pipes and access points in GIS maps might differ from what is actually present on the ground due to errors in data collection and data entry [255]. Therefore, the use of GIS alone would seem insufficient for the task of synthesising prior maps for SLAM.
There is a growing literature on mapping above-ground infrastructure using observations gathered from a variety of sensors including GPS, in conjunction with object recognition using machine learning. In particular, manholes can be mapped from above ground, and can also be detected below ground, within the pipe [257], hence make ideal landmarks. The problem of automatically mapping manholes from above ground has been studied using cameras [258] and laser scanners [259]- [261], as well as from the air using cameras in unmanned aerial vehicles (UAVs) [262]. More recently, DCNNs have been applied to the task of object detection and recognition for manholes [263], [264], including localising manholes [265]. Manhole cover detection with DCNNs has also been used to reconstruct the likely underground pipe locations between manholes using industry rules [266], [267].
The availability of GIS maps and data-driven maps of above-ground infrastructure raises the challenge of fusing these sources to generate a map prior. In [268] the authors use GIS data to improve the localization and detection of infrastructure objects in camera images in conjunction with a standard object detection algorithm [269]. The problem of fusing spatial data, e.g. GIS data, with images has also been studied in [270]. The fusion of different objects from different data sets (GIS and data-driven) also raises the challenge of handling uncertainty. Recent attempts have been made to quantify uncertainty in object detection from images with deep neural networks [271] but in general this is not a solved problem. Bayesian data fusion has been used to fuse data both in 2D in [234], where the focus was on GPR and GIS data, which has more recently been extended to Bayesian fusion in 3D using a wide variety of sensors [235].
There are not many studies currently on how to effectively incorporate priors into a SLAM system. In [272], a framework was developed for using Bayesian priors for SLAM in buildings. One issue highlighted was the care needed in tuning the certainty associated with the map prior: too much certainty leads to the map prior dominating the SLAM system, even when the robot senses discrepancies in live operation, and vice-versa. There is an opportunity, therefore, to generalise the results from [272] to form a framework for using prior knowledge in SLAM for pipe networks.

C. MULTI-ROBOT SLAM IN WATER AND SEWER PIPES
Cities of the future will likely have teams of robots cooperatively mapping water and sewer pipes. This will be essential to ensure full coverage of the pipe network. Multi-robot SLAM in water and sewer pipes has not yet been addressed in the literature, and raises a number of problems beyond single robot SLAM, regarding both the data and mappinglocalization algorithms, which are covered in this section.
Key questions on data include [273]: 1) What type of data will be shared? Will it be raw or processed data? 2) Considering issues of limited coverage and bandwidth, how will data be shared? 3) Where and how will data be processed? With centralised, decentralised, distributed, or hybrid architectures and methods? 4) Are the methods scalable and applicable to large pipe networks? How do we deal with missing data and data transmitting at different sampling rates? Regarding processing of the data, the three main approaches to multi-robot data fusion are centralised, distributed and decentralised [274], [275]: 1) Centralised data fusion: raw sensor data from multiple sensors are fused in a centralised processing node to produce a state estimate. 2) Distributed data fusion: raw sensor data is processed locally to produce a state estimate in each sensor, then the multiple state estimates are fused in a centralised processing node. This could be either: a) Fusion under known data correlation. b) Fusion under unknown data correlation. This requires additional estimation. 3) Decentralised data fusion: raw sensor data is processed locally and fused locally at each node. These questions are highly relevant to the inspection of real pipe networks due to potentially limited communication range, bandwidth, memory and processing power for small robots in pipes.
The centralised fusion approach is theoretically optimal but may not be ideal in practice because it requires large communication bandwidth, large processing capability in the central node, and is not robust due to the possibility of central node failure. However, if the cloud is used to perform the centralised fusion then this can be considered robust to failure, and we could assume that it would also have sufficient bandwidth and processing capability, and is appealing if robots have to return to communication hubs to transmit data on mapping and fault detection above-ground.
The algorithms for multi-robot SLAM tend to be extensions of the single robot SLAM algorithms described above in section IV, such as filtering methods based on the EKF [276], the extended information filter [277], [278], and the particle filter [279], and smoothing/optimization methods based on GraphSLAM [280], SAM [281], [282] and iSAM2 [283]. There are also a number of works that specifically address multi-robot visual SLAM using both centralised [284], [285] and decentralised/distributed fusion [286], [287], where the approach in [285] specifically targets the use of the cloud to perform centralised processing for the computationally intensive map optimization and data storage -an appealing approach for low-powered robots in pipes.
A key part of multi-robot SLAM is merging maps from all robots to construct a single, global map of the environment. To merge maps, generally either the initial poses of the robots must be known, or the robots must rendezvous to ascertain each other's pose, or the maps must overlap [273]. Robust methods for selecting consistent measurements for map merging now exist to solve this problem [288]. Map merging is likely to be simplified in the pipe environment because it would be feasible to obtain the initial pose of each robot by taking GPS readings of the robots at the point of entry to the pipe network, or exploiting prior knowledge of the location of the pipe access points -this would significantly simplify multi-robot SLAM in pipes. Robots could also potentially rendezvous in pipes to communicate and share map data, which would also serve to provide line-of-sight pose estimation. Alternatively, the robots could rendezvous with communication hubs throughout the pipe network and transmit their own estimated pose for centralised data-fusion and SLAM.
Finally, it is worth noting that if the initial pose of the robot is known in the world coordinate frame (from the point of entry into the pipe network), and features with known locations in the above-ground world coordinate frame are used in SLAM, such as manholes as in [203], [204] or fire hydrants, then the mapping can be directly performed by each robot in the world coordinate frame, which should greatly simplify the map merging process. The main problem would therefore be the uncertainty around multiple robots recognising common landmarks in the pipe network on which to base map merging, i.e. the data association problem. Robust methods would need developing to handle this problem, possibly based on multihypothesis SLAM.

VIII. CONCLUSION
This paper has presented a review of SLAM for inspection robots operating in buried water and sewer pipes. The review focused initially on the motivation that the buried pipe networks represent huge current and future investment and are an ideal domain where autonomous robots can make an important impact in future smart cities.
We reviewed the requirements of a robot system, focusing on mapping the buried assets and locating defects. We reviewed the main SLAM methods used in robotics currently (where smoothing/optimization methods tend to dominate over EKF and particle filtering methods popularised in the early days of SLAM), and brought recent reviews up to date with a discussion of the recent impact of deep learning in loop closing and visual odometry. We considered different map-types used in SLAM and concluded that hybrid methods, e.g. metric/topological, metric/semantic, and semantic/topological show promise for this environment, where topological maps are useful over a large spatial scale for efficient path planning, whilst metric information is needed for e.g. localising defects. We reviewed the various sensors-types used in SLAM and found that a wide range of sensors have been successfully used in pipes, implying that multi-sensor data fusion is probably the most appealing approach to maximise robustness.
And finally, we looked at future challenges focusing on single robots, with challenges of long term robust and scalable operation, trustworthy robotics, and active SLAM; the development of prior maps using existing data with above ground landmark mapping; and SLAM in pipes with multirobot teams.