A Review on Viewpoints and Path-planning for UAV-based 3D Reconstruction

Unmanned aerial vehicles (UAVs) are widely used platforms to carry data capturing sensors for various applications. The reason for this success can be found in many aspects: the high maneuverability of the UAVs, the capability of performing autonomous data acquisition, flying at different heights, and the possibility to reach almost any vantage point. The selection of appropriate viewpoints and planning the optimum trajectories of UAVs is an emerging topic that aims at increasing the automation, efficiency and reliability of the data capturing process to achieve a dataset with desired quality. On the other hand, 3D reconstruction using the data captured by UAVs is also attracting attention in research and industry. This review paper investigates a wide range of model-free and model-based algorithms for viewpoint and path planning for 3D reconstruction of large-scale objects. The analyzed approaches are limited to those that employ a single-UAV as a data capturing platform for outdoor 3D reconstruction purposes. In addition to discussing the evaluation strategies, this paper also highlights the innovations and limitations of the investigated approaches. It concludes with a critical analysis of the existing challenges and future research perspectives.


Introduction
A complete dataset which meets the predefined quality measures is the fundamental input for various applications like 3D reconstruction of large-scale complex objects with a high level of details (for example cultural heritage documentation), inspection of construction sites, and Building Information Modeling (BIM) just to name a few.Many sensors and platforms for capturing high-quality images and point clouds have been introduced in the last decades.Especially, unmanned aerial vehicles (UAVs) are widely used as platforms to carry data capturing sensors.More specifically, multi-copters (from this point forward we use the term UAV to refer to multi-copter) are agile, capable of performing autonomous data acquisition, can fly at different heights, and can reach almost any vantage point (Song et al., 2021).Hence, UAV-based 3D reconstruction is still attracting attention as a promising approach for generating semantic and geometric information about man-made and natural objects.
Although it is possible to generate 3D reconstruction products with impressive quality using state-of-the-art algorithms, the success of these reconstruction methods depends to a large degree on the input images with predefined specifications (Hepp et al., 2018b;Hoppe et al., 2012;Saponaro et al., 2019;Schmid et al., 2012;Seitz et al., 2006;Song et al., 2020a).Furthermore, adding more images cannot always deliver better reconstruction, and even diminishing returns at some points are reported in the literature (Hepp, 2018;Hornung et al., 2008;Seitz et al., 2006).Moreover, due to the limited battery capacity of the current UAVs and possible restrictions in accessing the area of interest, data capturing should be performed in a limited time (Smith et al., 2018).Hence, efficient viewpoint and path planning algorithms are required to capture the data for the successful reconstruction of 3D objects with various sizes, shapes, and complexities (Kaba et al., 2017;Kuang et al., 2020;Roberts, 2019;Song et al., 2020b).
The selection of appropriate viewpoints and trajectory for 3D reconstruction has been a research topic for some decades (Ahmadabadian et al., 2014;Aloimonos et al., 1988;Connolly, 1985;Debus and Rodehorst, 2021;Peng and Isler, 2017;Scott et al., 2003;Song et al., 2021).In robotics, view and path planning is a research focus of many studies on 3D reconstruction to make the robots explore and reconstruct unknown environments more competently (Chen et al., 2011;Scaramuzza et al., 2014;Scott et al., 2003).This task is known as the active vision or view-path-planning problem in robotics (Roberts et al., 2017;Song et al., 2021).In Photogrammetry, the problem of determining the optimal viewpoints for accurate measurement of 3D scenes or objects is called camera network design (Fraser, 1984;Hoppe et al., 2012;Mason, 1997;Saadatseresht et al., 2005) where mainly the geometric aspects of the problem are investigated in controlled environments for highly accurate measurement of artificial targets using convergent imaging.

Scope of the paper:
The review includes peer-reviewed publications and theses, focusing on wide range of model-free and modelbased algorithms on viewpoint and path planning for 3D reconstruction of large-scale objects.In addition to discussing the evaluations strategies, this paper also highlights the innovations and limitations of the investigated approaches.It concludes with a critical analysis of the existing challenges and future research perspectives.Since various platforms and sensors are employed for different 3D reconstruction applications, we exclude the studies which are not within the scope of this paper.
Platform: Many researchers in robotics and other communities investigate view and path planning for moving robots, UGVs (Unmanned Ground Vehicles) and wheeled robots (Cao et al., 2021;Hosseininaveh and Remondino, 2021;Huang et al., 2020), Autonomous Underwater vehicles (Panda et al., 2020), robotic arms (Bogaerts et al., 2019;Wu et al., 2014) as well as multi-UAVs (Jing et al., 2020;Nagasawa et al., 2021;Sadeghi et al., 2019) for optimal data capturing.Although there are some common issues between path planning for these platforms and UAVs, the assumptions, challenges, trajectories, platforms abilities, and environmental conditions could be very different.For example, in contrast to UGVs, UAVs should consider a 6D search space for the sensor poses within the reconstruction workflow (Schmid et al., 2012).Therefore, this paper focuses on viewpoint and path planning for a single UAV.
Sensor and 3D representation: Various sensors are installed on UAVs for data capturing in different applications (Nex et al., 2022).Monocular RGB camera (Song et al., 2020a), RGB-D sensor (Hardouin et al., 2020), depth cameras (Deris et al., 2017;Xu et al., 2021), structured light scanners (Ułanowicz and Sabak, 2021), multi-spectral camera (DadrasJavan et al., 2019), stereo cameras and LiDAR (Alsadik and Remondino, 2020;Bolourian and Hammad, 2020;Yoder and Scherer, 2016) are the most commonly used sensors (Yan et al., 2021).Compared to single RGB cameras, other sensors entail special hardware and setup, making them very efficient for some applications.However, it usually comes at the cost of more technical requirements, bounds applications and makes them more expensive.Different 3D representation methods have been developed depending on the type of scanning sensors.Images acquired from a monocular camera are generally processed by MVS algorithms (Schönberger et al., 2016) to reconstruct 3D models.MVS generates a dense 3D reconstruction offline by matching the stereo correspondences of all images.This method can estimate a wide range of depths based on various baseline distances of images; therefore, MVS is appropriate for large-scale modeling.RGB-D sensors and stereo cameras can estimate accurate and dense depth maps in real-time.The estimated depth maps can be integrated into a volumetric model or a surface model by the volumetric mapping (Hornung et al., 2013) or TSDF mapping (Whelan et al., 2014), respectively.Volumetric models represent a coarse 3D shape based on occupancy probability, whereas surface models represent the precise surfaces of an object.LiDAR can acquire very accurate point cloud data in real-time.The point clouds are relatively sparse compared to the depth maps acquired from depth sensors and cannot represent dense surfaces.Therefore, LiDAR-based methods (Qin et al., 2019;Yoder and Scherer, 2016) generally employ the volumetric mapping method for scanning.
Application: Geometric criteria like completeness and accuracy of the 3D models acquired from UAV missions are the most interesting properties discussed in the literature (Roberts et al., 2017).However, researchers also aimed at other properties like semantic information, which could be extracted from the UAV data (Popovic et al., 2017;Stache et al., 2021;Valente et al., 2013).Apart from many other applications like precision agriculture (Basiri et al., 2022;Just et al., 2020), indoor exploration and reconstruction (Shen et al., 2012;Sun et al., 2021;Zhu et al., 2018) to name a few, we focus on viewpoint and path planning for 3D reconstruction of complex outdoor man-made objects like buildings and bridges which is a topic of interest for many applications (Almadhoun et al., 2019).A closely related topic is coverage path planning (Bircher et al., 2015;Galceran and Carreras, 2013;Kaba et al., 2017;Peng and Isler, 2020;Shang et al., 2020;Tan et al., 2021) which deals with optimizing the trajectories that cover a known 2D or 3D environment.This topic could be considered as a setcover problem (Jing et al., 2019) or an art gallery problem followed by the shortest path optimization (Almadhoun et al., 2016;Heng et al., 2015;Papachristos et al., 2019a).However, since these approaches do not consider MVS heuristics and 3D reconstruction requirements and the optimal coverage path with low overlapped rate is desired, they are out of the scope of this paper.

SfM and MVS heuristics in viewpoint planning
Since many of the 3D reconstruction approaches rely on structure from motion (SfM) and multi-view stereo (MVS), we shortly list the most important heuristics as follows (see also Figure 1a): (1) Distance: the distance between camera viewpoints and object surface defines the resulting model resolution and depends on the desired GSD and the camera intrinsics (Koch et al., 2019;Peng and Isler, 2018).(2) Multiple views: every part of the scene has to be observed from multiple views (theoretically at least two views and practically more for increasing the reliability and accuracy of 3D reconstruction) from different perspectives with sufficient overlap between the views.Matching the corresponding points in overlapping images is the prerequisite for robust camera poses estimation and for triangulating 3D object points (Hoppe et al., 2012;Mostegel et al., 2016;Roberts et al., 2017;Schmid et al., 2012).(3) Observation and parallax angle: Shallow observation angles between the optical-axis of the cameras and surface normals are preferred in MVS.Moreover, large parallax angles (a.k.a angular separation) between cameras (also known as the B/H ratio in photogrammetry) are favored to increase the triangulation quality.However, it makes it more difficult to find correspondences (matching) between the images, especially for complex 3D structures (Koch, 2020;Mostegel et al., 2016;Peng and Isler, 2019;Smith et al., 2018).
Various models are used in the literature to control these parameters (Koch, 2020;Peng and Isler, 2018;Roberts, 2019).Three examples are illustrated in Figure 1b-d ) co-apex cones for forcing the suitable cameras to adhere to the heuristics, c) considering the segments of a hemisphere for viewpoint planning (Koch, 2020), and d) MVS-aware coverage model (Roberts, 2019) The parameters defined above are visualized in Figure 1a.The strategy which is used in (Koch, 2020) is illustrated in Figure 1b, where a hemisphere is considered around the sample point.While the hemisphere is divided into four segments, the ideal camera constellations intersect the hemisphere in different segments.The coverage model introduced by (Roberts, 2019) is depicted in Figure 1c.Since each camera covers a disk area on the hemisphere around the sample point, the usefulness of the cameras is measured by total solid angle covered by all the disks.

Problem formulation
Since different terms, objectives, priorities and approaches are utilized for viewpoints and path planning in various communities, we formulate the problem again to provide a holistic definition that covers different strategies and nomenclature, consistently.
Determining "good", "ideal" or "reasonable" imaging positions is a common research topic in various communities such as robotics, computer graphics, computer vision, and photogrammetry (Hoppe et al., 2012).In UAV-based research, some viewpoint and path planning approaches (Hoppe et al., 2012;Schmid et al., 2012) solve the planning problem in two steps: The first step solves the optimum viewpoint while ignoring the travel budget and the second step, minimizes the cost of visiting all viewpoints which are computed in the first step.However, some other approaches solve both viewpoint and path planning problems in a unified approach (Hepp et al., 2018b;Roberts et al., 2017).This is mainly a domain specific concern and depends on the application and requirements/priorities. Giving priority to the reconstruction of the object is useful (usually leads to) first solving the camera poses (sensor placement, a.k.a viewpoint selection) and then finding the shortest/fastest trajectory to visit all those points.The constraints are defined on the object and quality has the highest priority.In this case, after computing the vantage viewpoints, TSP (shortest Hamiltonian path between the viewpoints) is a commonly used approach in the literature.On the other hand, giving priority to resources (flight time, trajectory length, or the number of the images) the constraints are defined on the sensor (also constrained by the obstacles).The aim would be to get as much as possible but hardly constrained by the resources.For maximum gain, the classical knapsack problem could be used.However, costs are not fixed in view-path planning (depending on the distance to the previous viewpoint).Therefore, this is a combination of knapsack and TSP which is called the orienteering problem (Gunawan et al., 2016;Vansteenwegen et al., 2011).The orienteering problem (reward-collecting graph optimization problem) was first introduced by (Golden et al., 1987).The objective is maximizing the total gain collected from visited nodes.Because of the limited time budget, all available nodes may not be selected.In orienteering problem marginal rewards are additive, but in MVS the usability of each camera is not additive and depends on some heuristics which could be defined considering the poses of other cameras.Different strategies are employed to tackle this problem (Hepp et al., 2018b;Roberts et al., 2017).Generally, we formulate the viewpoint and path planning as follows: Considering a camera  with fixed intrinsics (at least during the data capturing) mounted on a drone, the objective of viewpoint and path planning is to find a feasible trajectory  of the drone which meets some constraints  and passes through optimal camera positions  = { 1 ,  2 , … ,   } for taking high-quality pictures at optimal orientations Φ = { 1 ,  2 , … ,   } to achieve a predefined quality  as much as possible.
The camera locations and orientations  = {{ 1 ,  1 }, { 2 ,  2 }, … , {  ,   }} constitute a 6D search space.Generally, roll angle could be neglected in practice, resulting in a 5D search space for camera poses.Some researchers also add some more constraints (e.g.restricting the search space to a sphere around the object (Vasquez-Gomez et al., 2009), some adaptive planes (Peng and Isler, 2019), or height-wise data capturing (Sharma et al., 2019)).They further reduce the computations at the cost of having less chance to find the optimal solution.Various heuristics ℋ (cf.section 1.2 ) could be utilized to encourage the pose optimization algorithms to converge to a solution that meets the requirements of each specific application.
There are also various and different constraints  which could be considered during path planning.Battery life, the shortest path between viewpoints, detecting and avoiding permanent and moving obstacles, and excluding no-go areas in the scene are the main constraints that are investigated.
Predefined quality  of 3D reconstruction depends on the objective of the project and could be different in specific domains and applications.For example, in some applications wherein the full coverage of 3D reconstruction is very important,  could be formulated as the completeness of the results and other metrics like the accuracy of the reconstructed object could be relaxed.On the other hand, in some domains in which the accuracy is very important, other constraints have less priority, and accuracy has the highest weight in the planning.We will address these metrics in Section 3.

Viewpoint and path planning strategies
Given that a maneuverable platform (e.g., UAV in our scope) is available, an initial tempting idea could be to capture so many images to achieve the best 3D reconstruction.Considering the limited endurance of the UAVs and also other practical flight-time limitations, diminishing returns and even destructive effect of adding views after some points (Hepp, 2018;Hornung et al., 2008;Seitz et al., 2006) and high computation cost of processing all images, this method is neither efficient nor applicable in real scenarios.Hence, most UAV-based reconstruction projects are currently performed either using manual piloting or using off-the-shelf commercial flight planning applications with the camera pointing in a fixed direction (Smith et al., 2018;Yan et al., 2021).Another idea could be to select/filter a subset of already densely captured images for 3D reconstruction to decrease the significant computational overheads and avoid the destructive effect of some images on the final product.These techniques require a complete set of already captured images to filter some of them to achieve high quality 3D reconstruction.(Mauro et al., 2014), which employs view importance measure for filtering the viewpoints and (Maldonado et al., 2016), which is incremental NBV-based approach are two examples of this category.Besides data capturing time which still is a problem, the total computational cost of viewpoint selection and processing the remaining images could be close to the processing time of all images.Furthermore, camera locations are densely sampled and the other three parameters of camera pose namely rotations are not optimized.Some other approaches try to improve the output of regular pattern data capturing by estimating the missing parts and low-quality 3D reconstructed areas and iteratively adding some amending viewpoints to improve the final 3D product (Zhang et al., 2020).However, the best solution could be to have a holistic planning approach before starting data capturing which is the topic of a rich body of viewpoint and path planning literature in various research communities.
Automated flight planning methods can be classified either as model-free or model-based methods (Jing et al., 2016;Kaba et al., 2017;Koch, 2020;Yan et al., 2021;Zhang et al., 2020).Model-based methods use an initial coarse model (geometric proxy or simply proxy) for generating optimal viewpoint and path planning.But, modelfree methods tackle the problem given that there is no such initial coarse model of the objects and environment and iteratively generate and update the model with new measurements.In the following parts of this section, we briefly discuss off-the-shelf mission planners.Afterward, we delve into the model-free and model-based methods.
Usually offering limited parameter settings, commercial flight planners generate conservative trajectories (e.g., an orbit, a zigzag, or a lawnmower pattern at a safe flight height) to cover the scene (Koch et al., 2019;Kuang et al., 2020;Peng and Isler, 2019;Roberts et al., 2017).However, these trajectories are generated regardless of the scene geometry and structure and distribution of the objects.Hence, these tools tend to over-sample some regions (e.g., rooftops) while under-sampling others (e.g., facades, convex areas, overhangs, and fine details), especially in dense areas and complex objects, and therefore sacrifice reconstruction quality (Hepp et al., 2018b;Roberts, 2019;Zhou et al., 2020b).Furthermore, most of such planning tools do not directly choose viewpoints adhering to all SfM and MVS constraints and do not directly account for complete coverage of the scene.Recently Agisoft introduced a tool for designing an optimum path-planning for image capturing and creating mission plans based on the rough model (Agisoft, 2022;Zhang et al., 2020).In this tool, a set of camera positions are generated to cover the object's surface with sufficient overlap.However, it cannot promise the completeness of the dataset in real-world situations, and the accuracy of the final point is not explicitly considered in this approach.Although the Agisoft mission planning tool has a power-line detection algorithm, most off-the-shelf flight mission planners may cause an accident with adjacent obstacles in the environment and should be used cautiously and at the safe fixed distance to the object.

Model-free methods
Model-free methods do not have any prior information for target structures or scenes.Since there is no prior information, it is challenging to compute an optimal scanning trajectory.Therefore, model-free methods have to find the best scanning trajectory in an online manner from a partially constructed model.This problem is the same as the exploration planning problem, which determines the scanning paths online to explore an unknown and spatially bounded space.In this problem, a scanning platform has to estimate its location and construct 3D models in real-time.The location can be estimated using the simultaneous localization and mapping (SLAM) method (Leonard and Durrant-Whyte, 1991), and sparse 3D models can be constructed using an environmental mapping method (Estrada et al., 2005).SLAM systems are used to define the poses of the platforms, capture data from the scene, and direct the platform to the defined positions.Still, before directing the platform, the pose of the next station must be determined.In the last two decades, many SLAM approaches used a probabilistic method for reducing the impact of the inaccurate sensor on the map, using millimeter waves for building relative maps (Dissanayake et al., 2001), integrating Particle Filter and Extended Kalman Filter (Montemerlo et al., 2002), or introducing Square Root Smoothing and Mapping (Dellaert and Kaess, 2006).Another SLAM system is Visual SLAM which can produce a fast 3D reconstruction on-the-fly (Fang and Zhan, 2020).
Most model-free methods employ the next-best-view (NBV) approach.This approach is a greedy method that finds a local solution from partial information.It consistently determines the best viewpoint that obtains the largest unknown information from a current map.The unknown information is defined according to the purpose of various scanning scenarios.Some methods (Batinovic et al., 2021;Cieslewski et al., 2017) try to explore a volumetric map within a minimum time thoroughly.They evaluate the unknown volumes to determine an NBV in the volumetric map.Some methods (Bircher et al., 2018;Song et al., 2020a) are trying to inspect all surfaces of an unknown structure.These methods find a viewpoint that observes the largest uncovered surfaces.The other methods (Hardouin et al., 2020;Kompis et al., 2021;Song and Jo, 2018) try to reconstruct precise 3D models of target structures or environments.These methods analyze the reconstruction quality of 3D models, such as completeness and accuracy (Knapitsch et al., 2017).This section classified the model-free methods into three categories: frontier-based planning, volumetric-based planning, and surface-based planning.The following sections present literature reviews on each category of model-free planning methods summarized in Table 1.

Frontier-based methods
Frontier-based planning is one of the most widely used mobile robot exploration methods.The main idea was originally proposed by Yamauchi (Yamauchi, 1997), who defined the frontier as a boundary between an explored and unexplored area in a 2D occupancy grid map (Moravec and Elfes, 1985).The occupancy grid map represents a 2D environment as a set of grid cells composed of three states: free, occupied, and unknown.The frontier cells are estimated by collecting free cells adjacent to unknown cells.The frontier-based method starts at an initial location with an initially scanned map.The method continuously moves toward the nearest accessible and unvisited frontier cell until entire unknown cells are explored.This method generally shows acceptable exploration performances while intuitive and straightforward.Umari and Mukhopadhyay (2017) extended the frontier-based method to detect the frontiers in the occupancy grid map efficiently.They found frontiers by expanding multiple rapidly-exploring random trees (RRTs) instead of classical image processing methods like edge detection.The RRTs expanded towards unexplored regions based on the Voronoi diagram; they quickly extracted the frontiers with navigation paths.
The frontier-based method has been extended to 3D exploration tasks of a UAV by utilizing a 3D volumetric map (Hornung et al., 2013) instead of a 2D occupancy grid map.Frontier volumes are defined as the free volumes adjacent to the unknown volume in volumetric maps.The octree structures can rapidly access the 6-or 18-adjacent volumes.Most methods iteratively determine the NBVs based on the frontier information.Oßwald et al. (2016) improved the exploration performance of the general frontier-based method by providing additional guide information on global routes.They utilized a user-provided topological graph representing the topology information of the environment.A global route is computed by finding a traveling salesperson problem (TSP) solution on the topological graph.A mobile robot sequentially visits every node based on the TSP tour while exploring local regions using the frontier-based method.This method can produce more effective exploration paths that prevent revisiting a region that has already been passed.Cieslewski et al. (2017) extended the frontierbased method to maintain the maximum speed of a mobile robot.They determined an NBV by selecting target frontier volumes inside the current field of view (FoV).This method can prevent a mobile robot from slowing down due to heading rotation; therefore, the robot can explore unknown regions at high speed.This method reduces the exploration time while it may increase the trajectory length.
Several studies (Batinovic et al., 2021;Dai et al., 2020) have focused on reducing the computation time of frontier extraction and clustering.Dai et al. (2020) proposed an implicit frontier clustering method.They updated the frontiers only in the volumes inside the camera frustum at every depth integration step.Batinovic et al. (2021) tried to filter out many frontier points in multi-resolution volumes on Octomap.They also applied the mean-shift clustering for frontier clustering.These approaches significantly reduce the number of frontiers, which reduces the computation time of candidate viewpoint evaluation.Some others (Meng et al., 2017;Shade and Newman, 2011) have tried to determine the next-best trajectories instead of the NBVs.They evaluated view sequences and found the most informative view paths.Shade and Newman (2011) proposed a 3D exploration method for stereo-camera-based scanning.They first extracted frontier volumes in a volumetric map and composed 3D vector fields toward the frontier volumes.They then computed the steepest descent path in the vector fields to get the next-best trajectory for 3D scanning.Meng et al. (2017) proposed a two-stage planning method for 3D exploration.The method first generates frontier sets by clustering adjacent frontiers and then computes the coverage path to cover all frontier sets.The coverage path can be computed by solving a fixed start open TSP.A mobile robot moves to the first edge of the coverage path and continuously recomputes the coverage path.Since this method considers global coverage instead of a single viewpoint, the number of revisits of the same area can be reduced.

Volumetric-based planning
Volumetric-based planning focuses on reconstructing a complete volumetric model instead of a fast exploration of unknown areas.Unlike frontier-based planning, volumetric-based planning analyzes not only the frontier but also the entire spatial information of the volumetric map.Estimated 3D information is sequentially integrated into an octree structure (Hornung et al., 2013).Each volume has one of three states (free, occupied, or unknown volume), updated based on the occupancy probability.The volumetric map is very suitable for view planning since it can easily access entire spatial information by the multi-resolution representation of Octomap.Furthermore, it can efficiently perform a ray-casting for visibility check.This operation is essential for viewpoint evaluation.
Volumetric-based planning method has also been widely used in object modeling.Vasquez-Gomez et al. (2017) proposed a volumetric-based planning method for single object modeling.They evaluated viewpoints by analyzing unknown volumes with overlapped volumes for point cloud registration.They also applied a hierarchical ray-tracing method for fast visibility check.This method starts from the rough-resolution map and applies high-resolution ray-tracing only around occupied volumes.Delmerico et al. (2018) proposed several information gain functions for viewpoint evaluation on the volumetric model.To quantify the visible information, they just counted unobserved voxels or calculated a weighting sum of voxel entropy.The authors also considered the object's rear side voxels with their entropy.They evaluated the performances of the proposed information gain functions through various modeling scenarios.Daudelin and Campbell (2017) proposed a view planning method for modeling an object without any prior information such as size or bounding box.They dynamically extended search spaces for the sampling of viewpoints based on a partial reconstruction.The search spaces are not restricted to known free space.They estimated all reachable configurations of a mobile robot's sensor and generated viewpoint samples from the reachable configurations.Similar to (Delmerico et al., 2018), the total amount of information gain for each candidate viewpoint is computed.
Several studies also proposed a volumetric-based planning method for exploration tasks.Bircher et al. (2016) employed a receding horizon planning strategy for exploration planning.The strategy iteratively computes an optimal exploration path and only executes the first step of the optimal path.This method generates a set of viewpoints by expanding an RRT and selects the best branch that provides the maximum information gain.The information gain is defined as the total volume of visible unknown cells penalized by the distance costs of nodes.The method then determines the NBV to the first node of the branch.This approach can efficiently find exploration paths with a low-computation complexity even in large-size environments.Papachristos et al. (2019b) extended the receding horizon planning method (Bircher et al., 2016) to additionally consider the localization uncertainty in exploration tasks.The proposed method comprises two planning stages: volumetric exploration planning and uncertainty-optimization planning.Like the receding horizon planning, the exploration planning expands branches of a random tree and determines an NBV from the first node of the most informative branch.To compute an information gain, they consider not only unknown volumes but also the occupancy probability of occupied volumes.The uncertainty-optimization planning then finds an optimal trajectory to the NBV, which provides the minimum localization uncertainty for the SLAM module.Dharmadhikari et al. (2020) also extended the receding horizon planning to provide fast and agile paths for a MAV.They directly generated view configurations and admissible paths based on motion primitives.They then determined a future-safe path that provides the maximum information gain and guarantees continuous fast motions.Batinovic et al. (2022) proposed an exploration planning method for LiDAR scanning data.They applied a recursive shadow-casting algorithm for fast information gain computation of large-scale point clouds.They also proposed a cuboid-based path evaluation method that estimates the information gain of each RRT edge instead of a node.Song and Jo (2017) applied an inspection approach to a model-free planning method.Given a prior model of a target structure, the inspection approach precomputes a coverage path that provides full visual coverage of the whole surfaces of the prior model.On the other hand, Song and Jo (2017) addressed an online inspection planning on an incrementally updated and partially known model.The inspection planning method first determines an NBV that explores the largest unknown area in a volumetric map.Similar to (Bircher et al., 2016), it expands branches of RRT and evaluates the information gain of each branch to determine the NBV.The method then plans a local inspection path to the NBV, providing complete visual coverage of near-frontiers.The local inspection path is continuously replanned according to the updated near-frontiers until a robot reaches the NBV.This method improves the completeness of volumetric modeling because it can thoroughly scan all small unreconstructed regions.Song et al. (2020a) also extended the online inspection planning to consider global coverage planning.They proposed an online map partitioning method, which decomposes entire space into a set of sectors by clustering free and unknown volumes.The decomposed sectors have a compact and convex shape; therefore, the sectors represent a topological map.The method plans a global coverage path of unexplored sectors and determines an NBV to move toward the next sector.It then plans a local inspection path that fully covers local frontiers.This method significantly reduces total exploration time and path length by reducing the number of revisits of the sectors.

Surface-based planning
Surface-based planning concentrates on reconstructing a precise 3D surface model, which is represented as surface meshes (Kazhdan and Hoppe, 2013), surfel points (Whelan et al., 2016), or truncated signed distance function (TSDF) (Newcombe et al., 2011).Volumetric models are difficult to sufficiently express complex surface information; therefore, the surface-based planning methods analyze the shape and trend of reconstructed surfaces instead of volume information.Most surface-based planning methods have been used to reconstruct small-scale objects.Chen and Li (2005) predicted the shape and trend of simple and smooth surfaces from the curvature tendency and determined an NBV based on the predicted surface information.Wu Shihao et al. ( 2014) also predicted a tentative surface model from a partial reconstruction by using the Poisson surface reconstruction algorithm (Kazhdan and Hoppe, 2013).They then evaluated the completeness and smoothness of Poisson isosurfaces to determine an NBV. Lee et al. (2020) detected surface shapes with surface primitives such as planes, cylinders, and spheres and utilized them for NBV determination.
Recently, several studies have attempted to apply the surface-based planning method to large-scale modeling tasks.Hardouin et al. (2020) analyzed the reconstructed surfaces to determine an NBV for MAV exploration.They estimated 3D data using a stereo camera or RGB-D sensor and integrated them into a surface model based on the TSDF (Newcombe et al., 2011).TSDF can reconstruct detailed 3D surfaces in real-time from a volumetric distance field, which contains sign distances to the closest surface.TSDF is also useful for identifying missing model parts during an online reconstruction (Monica and Aleotti, 2018).To determine an NBV, Hardouin et al. (2020) first detected incomplete surface elements (ISEs) from TSDF and then generated viewpoints along with normal directions of ISEs at a specified distance.They clustered the neighboring viewpoints and selected a cluster covering the most ISEs.They incrementally updated the TSDF model by scanning from an NBV until no ISE was detected.Song and Jo (2018) proposed a surface-based exploration, which is an extended method of online inspection planning (Song and Jo, 2017) for surface model reconstruction.The surface-based exploration analyzes both the volumetric map and surface model in TSDF for fast exploration and complete reconstruction.The method determined an NBV that explores the largest unknown areas in the volumetric map.The method then plans an inspection path covering the surrounding low-confidence surfaces and frontiers.To cover the low-confidence surfaces, it generates a viewpoint set for each surface by inversely composing a view frustum from a surface to its normal direction.An inspection path is computed by finding the minimum distance trajectory that visits at least one viewpoint from each viewpoint set.A generalized TSP algorithm can compute the minimum distance trajectory.The method then refines the inspection path to cover the near-frontiers by applying the inspection planning method of Song and Jo (Song and Jo, 2017).Their method has a longer path length and more completion time, but the percentage of the coverage and the quality of the model are improved in their method.However, using a better objective function could improve the quality of the 3D reconstruction.Schmid et al. (2020) proposed an exploration planning method for reconstructing a surface model.They implemented a new RRT method, which continuously expands a single tree to get global coverage with the maximum utility.Unlike the original RRT method, it rewires tree nodes according to their utility to maintain nonexecuted nodes and sub-trees.The method then refines intermediate paths and computes a global coverage path that maximizes the utility.They also proposed an information gain function for TSDF-based surface models.It considers not only a TSDF weight but also a sensing error.The sensing error of the depth sensor is modeled as the quadratic weight of the depth range.In their experiments, the proposed method can reconstruct more accurate 3D models than the other surface-based planning methods such as (Yoder and Scherer, 2016) and (Song and Jo, 2018).
A new informed sampling method for exploration path planning is introduced in (Kompis et al., 2021).The informed sampling method ranks the sampled viewpoints based on a utility function and finds the viewpoint candidates that are likely to have high information gain in advance.They used a stereo camera for 3D mapping the environment and Voxblox (Oleynikova et al., 2017) as the map representation of the scene.They generated viewpoints based on surface normal and frontier voxels' normal and computed their ranks according to an artificial potential field.The artificial potential field considers MAV position and viewpoint's distinction and repetitiveness.The method then evaluated only a subset of viewpoints according to the ranks.This approach significantly reduces the computation time of viewpoint evaluation.Song et al. (2020b) applied a surface-based planning method to online multi-view stereo (MVS) reconstruction for the first time.On the other hand, Song et al. (2020b) implemented an online MVS system that reconstructs a large-scale scene in real-time.The system computes camera poses from the key-frame-based SLAM (Mur-Artal et al., 2015) and estimates depth maps of key-frames by performing the stereo matching of neighboring key-frames.For the depth estimation, it utilizes a monocular mapping algorithm, REMODE (Pizzoli et al., 2014).The estimated depth maps are integrated into a single 3D model based on the surfel mapping method (Whelan et al., 2016).They utilized the surface-based exploration method to plan a local inspection path for scanning lowconfidence surfaces.Trajectory optimization is applied to maximize the MVS performance.They considered several multi-view stereo heuristics for the trajectory optimization, such as parallax, relative distance, and focus angle.Furthermore, Song et al., (2021a) extended this work for a more precise 3D model reconstruction.They utilized the deep-learning-based MVS, CasMVSNet (Gu et al., 2020), instead of REMODE (Pizzoli et al., 2014) for depth computation.CasMVSNet estimates a depth map by using multiple small cost volumes.It progressively reduces the depth hypothesis range in a coarse-to-fine manner, making it possible to process a high-resolution image in real-time.CasMVSNet provides better reconstruction performances with respect to accuracy and completeness than REMODE.
Surface-based planning analyzes detailed reconstructed surfaces in a surface model; however, the surface model may require a lot of memory.In this case, it can be limited to implementing the 3D scanning system onboard.To address this problem, a coarse surface model can be utilized for path planning by lowering the resolution of the reconstructed model.However, the advantage of surface-based planning that analyzes the surface shape and trend of a precise 3D model is lost.Finding a way to efficiently handle large-scale 3D data for an onboard system would be an interesting direction for future work.Table 1 Summarizes the state-of-the-art modelfree approaches.

Model-based methods
Model-based methods leverage a coarse representation of the scene geometry (geometric proxy) for generating optimal viewpoint and path planning.Recently, Agisoft Metashape 1.5 introduced functionality for designing an optimum path-planning for image capturing, which creates mission plans based on the rough model (Agisoft, 2022;Zhang et al., 2020).In this approach, a set of viewpoints with sufficient overlap is generated to cover the surface of the object.However, the computed static network of camera poses cannot promise the completeness of the dataset in real-world situations, and the accuracy of the final point cloud is not considered explicitly.Recently explore-then-exploit methods (Figure 2) which consist of two phases are utilized in UAV mission planning (Peng and Isler, 2019;Roberts et al., 2017;Yan et al., 2021;Zhang et al., 2020).The first phase is the exploration that generates the initial viewpoints and flight path similar to off-the-shelf flight planners or manual flight (Koch et al., 2019).The second phase is called the exploitation phase, where the geometric proxy is employed to design optimum viewpoints and trajectories to acquire proper images for 3D reconstruction (Sadeghi et al., 2019;Zhou et al., 2020b).The main advantage of the explore-then-exploit strategy is that prior knowledge of the scene geometry facilitates the global optimization of 3D coverage and accuracy in the secondary exploitation visit (Peng and Isler, 2019).

Exploration
Although most researchers use a nadir flight for this phase, employing orbit patterns is also reported in some literature (Roberts et al., 2017).The outcome of exploration phase approximates the scene geometry which is called "geometric proxy".The geometric proxy is usually a mesh (Hoppe et al., 2012;Sharma et al., 2019), point cloud (Yan et al., 2021), or supported by voxel representation (Hepp et al., 2018b).The initial model or geometric proxy obtained by exploration, may include many gaps or inaccurate areas (Hepp et al., 2018b;Koch et al., 2019).Therefore, this rough 3D representation of the scene is mostly used for optimal viewpoint planning (to generate a high-fidelity reconstruction) as well as computing the 3D safe navigable zone for the platform (Hepp et al., 2018b;Roberts et al., 2017;Smith et al., 2018).However, (Yan et al., 2021;Zhang et al., 2020) used this geometric proxy to locate incomplete or low-quality areas to guide the camera placement in the exploration phase.
Different from most other approaches, which use an exploratory flight for generating geometric proxy, in (Zhou et al., 2020a), a very coarse 2.5D coarse model is generated from the Google map (providing 2D footprints of buildings) and a single satellite image (for height estimation based on shadows).
Using this approach, the planning can be done before visiting the target sites.However, it needs an up-to-date satellite image captured during a sunny day and assumes that the ground is relatively flat.Moreover, extrusion of the 2D footprint of irregularly shaped architectural objects leads to a non-optimal viewpoint planning in the exploration phase.
Figure 3: comparison of proxies.First row: coarse 2.D proxies generated in (Zhou et al., 2020a); Second row: 3D proxies generated using an exploratory flight.Images are adopted from (Zhou et al., 2020a) A semantic-aware exploit-and-explore flight planning approach is introduced in (Koch et al., 2019).The main contribution of this research lies at employing semantic information of the exploratory flight for extracting the target object and generating a semantically-enriched coarse proxy for defining inadmissible airspaces for safe trajectories.A fully Convolutional Network (Long et al., 2015) is applied on 3069 images (from Semantic Drone Dataset12 and ISPRS 2D Semantic Labelling Benchmark (Rottensteiner et al., 2014) and also some manually annotated UAV images).The open street map (OSM) information are used to improve the results.The object of interest is extracted using a region growing seeded by the user input, which is applied on the segmentation result.The semantics of the environment is very important (both practically and legally) in defining the flight trajectory, especially in dense urban areas.
It is worth mentioning that with the recent advancements in the concepts and realization of BIM, digital twin, and smart cities, the 3D rough models of large structures like buildings and bridges are going to be more available.Although this 3D spatial information can alleviate the exploration in UAV mission planning, semantic and geometric information of the whole scene should be updated for a safe UAV mission and high-fidelity 3D reconstruction.

Exploitation
After generating the geometric proxy, exploitation aims at designing optimum viewpoints and trajectories to provide the data for high-quality 3D reconstruction (Sadeghi et al., 2019;Zhou et al., 2020b) constrained by platform and environment conditions.(Hoppe et al., 2012) proposed an uncertainty-aware and heuristic-based viewpoint planning approach which aims at accurate 3D coverage of the scene.Their approach belongs to the generate-and-test approaches which generates potential camera poses by assigning one fronto-parallel view to each triangle of the proxy-mesh.Then an angular histogram (four bins in [0-40]) is computed for each triangle.Another histogram which is the accumulation of cameras of all triangles for each angular bin is used to score the cameras.Based on camera scores (which are updated after the selection of each viewpoint), a greedy selection of the best (i.e.observes the most triangles at a novel angle) overlapping neighboring viewpoints is employed in order to select a subset of the viewpoints to reach one of the two stopping conditions for each (in practice 95% ) of the triangles: 1) estimated accuracy, measured by the largest semi-axis of the covariance ellipsoid, is acceptable, or 2) at least one camera of each bin is selected.Although the efficiency of this approach is not reported, we believe that initializing the camera positions by considering one candidate for each triangle of the proxy mesh may generate a very dense search space and is not applicable in real-world large-scale projects.Since this is an off-line approach, no feedback about texture or illumination conditions is considered during data capturing.Jing et al. (2016) subdivided the 3D rough model using the Bubble Mesh method.The initial model is divided into small patches.Afterward, randomly sampled viewpoints are generated around the target building.After investigating the visibility (between each surface patch and each viewpoint) and adjacency (between each pair of viewpoints), the selection of viewpoints is performed as a modified set covering with constraints using a neighborhood greedy search algorithm.SFM and MVS heuristics are not involved in the viewpoint selection and computing the orientation of the cameras and quantitative evaluation of the results are missing.Although the proposed method for designing image orientations seems feasible, generating random positions is too risky, and it might cause some gaps or low-quality areas on complex objects.
In the seminal research of (Roberts et al., 2017), selecting the viewpoint and trajectory planning are jointly considered in a unified global optimization problem as a submodular orienteering problem.This innovative approach finds a path that maximizes scene coverage while adhering to the SFM and MVS heuristics and respecting the limited travel-budget of the drone.After introducing a coverage model, the approximated optimal orientations of all candidate cameras are computed.Submodular orienteering is then transformed into an additive orienteering problem which is solved as an integer linear problem.However, the coverage model of this approach is monotone which means selecting more cameras never reduces the coverage score, which is not in line with the findings of (Hepp, 2018;Seitz et al., 2006).Moreover, there is no stopping condition except travel-budget, which is independent of the complexity of the object and desired quality.Lacking online processing also prevents reaction to low-texture areas and moving objects during flight.Lastly, uniform sampling of the object points avoids putting more attention on the edges.Hepp et al., (2018b) proposed a system for 3D reconstruction of building-scale scenes which is similar to Roberts2017 in spirit.Based on the coarse mesh (generated from images with overhead pattern), a volumetric occupancy map (fixed voxel size=0.2m)containing occupied, free-space, and unobserved voxels is generated where each voxel is also attributed by a measure of observation quality.The authors introduced an approximated camera model (to make the optimization problem suitable for a submodular formulation) to measure expected information gain from an individual viewpoint, independently from other viewpoints.After generating viewpoints candidates, ray-casting is used to compute the visible voxels for each viewpoint to evaluate the corresponding contributed information.Since this model entirely ignores the MVS heuristics, the proposed method encourages fronto-parallel views during the computation of information contribution of voxels.To solve the submodular optimization with travel-budget constraint, an adaption of the recursive strategy which is introduced in (Chekuri and Pál, 2005) is used.This strategy recursively splits the trajectory and travel-budget into two parts and selects the reachable (with the current travel budget) viewpoint with the highest information gain.The authors employed voxels' attributes (free, occupied, unknown) to find the free-space motion paths.Paths between viewpoints are preferably a straight line, or in case of an obstacle they are piecewise linear computed using RRT* algorithm (Karaman and Frazzoli, 2011).In spite of very promising results, this approach uses images from the exploratory flight in reconstruction and also in uncertainty computation.This would lead to a problem when the geometric proxy is not generated from the exploratory flight (cf.section 2.3.1).Even if a good geometric proxy is available, the exploration should start after an initial flight.Another issue of these methods is that their optimization needs the minimum number of viewpoints as input and is not guaranteed to converge.The user should decide how long to run the optimization.Furthermore, using a recursive greedy algorithm to maximize the optimization objective may lead to some viewpoints which cannot make a single adjustable block.Hence, some other images are added to make sure that all the images are connected.(Hoppe et al., 2012;Mostegel et al., 2016) solve this problem by applying an overlapping constraint in the recursive selection process.
An Explore-then-exploit approach is introduced in (Peng and Isler, 2019) which after exploration and generating the geometric proxy, builds a set of adaptive viewing planes for viewpoint selection.This is an iterative approach in the exploration phase (usually one exploratory flight and two exploitation flights) which repeats data capturing after identifying the low-quality regions until quality converges or a desired level of quality is achieved.The main novelty of this approach which is an extension of their previous study (Peng and Isler, 2018), is considering a set of adaptive viewing planes (Figure 4) to limit the search space for camera placement leading to similar resolution for the images in each patch.However, generating viewing planes based on the slope of regions may cause gaps because viewing planes are created based on the average of patch points normal.Hence, in the complex regions the viewing plane might not be able to cover all regions, properly.(Peng and Isler, 2019) In (Smith et al., 2018) a dilation operator with a ball structure element (whose radius is the desired minimum distance) is applied on the geometric proxy to estimate a closed surface boundary between safe and potentially dangerous airspace.The authors employed some reconstructability heuristics for predicting reconstruction quality.The contradicting effect of parallax on triangulation error and matchability are estimated and considered.Moreover, the distance of the camera to the object and binary visibility of the points based on the geometric scene proxy and the deviation from normal of the surface are also elaborated in the heuristic function.The authors use the geometric proxy to initialize the camera network with a fixed number of uniformly distributed cameras.Utilizing a set of uniformly distributed samples on the geometric proxy, pairwise view parameters are improved incrementally using the simplex method to optimize the reconstructability of the samples.Different aspects of 3D reconstruction heuristics are formulated and embedded very well in the objective function of this approach.However, since fixed and uniformly distributed sample points are used, depth discontinuities posed some problems in the experiments.Moreover, the inability of this off-line approach to adapt the number of views could be improved to avoid oversampling and undersampling of the viewpoints.Furthermore, like almost all other approaches, varying surface texture and lighting are not considered in their approach.Sharma et al. (2019) introduced a floor-wise viewpoint and path planning for optimal building surface coverage.The geometric proxy is a mesh which could be obtained either from building blueprints or cross-sections of a rough model generated by a reconnaissance (exploratory) flight.For each height a contour is generated and minimum 60% overlap between image strips is considered to cover the contours in 3D.Offsetting the contours by a fixed distance, a 2D flight map is generated.This initial flight path could be updated to avoid obstacles which are computed with MVS reconstruction from the reconnaissance flight.The camera positions are initialized by sampling the flight path every 1 m and camera poses (position and orientations) are then optimized using genetic algorithm to maximize the total coverage such that at least N (=3) cameras see each patch of the surfaces.Since this approach does not consider most MVS heuristics, the generated 3D model is suitable for the applications like VR which the details and completeness of the dataset are more important than the accuracy of 3D reconstruction.
In (Yan et al., 2021), by evaluating the quality of the geometric proxy (point cloud), two types of new viewpoints are selected in a constrained view sampling space.The new viewpoints of first type are considered for improving incomplete or low-quality areas and the second type viewpoints are added to fully cover the entire scene.Finally, various optimization algorithms are employed to generate a smooth 3D trajectory.For this step, ACO, PSO, and Branch and Bound (BnB) methods are compared and the best results are reported using ACO.Similar to (Smith et al., 2018) the sample points on the geometric proxy are distributed uniformly.Therefore, most sample points with lower quality scores were located at the edges of buildings or structures with more geometric details.Two main objectives in their method are maximizing the coverage and minimizing the path length, so the quality of the 3D reconstruction does not play any role in their method for viewpoint planning.(Kuang et al., 2020) proposed an on-line path planning method for large-scale urban scene reconstruction.In this research the geometric proxy of the buildings is generated using an oblique initial flight around the buildings.Then, instances in the scene are segmented using the well-known Mask-RCNN (He et al., 2017) approach and simple geometric shapes are used to represent the layout of the scene.Flight trajectory is minimized in two levels, namely inter-buildings and intra-building levels.In the first level of optimization each building is considered as a node of a graph (Figure 5) and the shortest path is computed using the classical Dijkstra algorithm.In the second level, the task is formulated as the Chinese postman problem.(Kuang et al., 2020) For on-the-fly predicting the coverage of the whole scene a lightweight SLAM framework is utilized.Then, unobserved spaces are covered by adding new viewpoints.If the endurance of UAV allows, this online approach also has a mechanism which tries to capture close-up images of architectural details.The main limitation of this approach is that SFM and MSV heuristics are not considered in the planning, directly.Moreover, since prior information of the coarse scene model is not completely used during flight, more viewpoints are required.(Huang et al., 2018) proposed an efficient viewpoint and path planning method based on a fast MVS and NBV algorithm.This approach consists of an online front-end for viewpoint and path planning to control the image capturing process and offline back-end for generating the final dense 3D model (Figure 6).The online front-end incorporates a closed-loop for iterative updating the initial model, coverage evaluation and NBV planning to achieve satisfactory coverage.
Figure 6: The pipeline of the proposed approach in (Huang et al., 2018).Image courtesy: (Huang et al., 2018) The proposed approach uses the initial model generated from an enclosing rectangular flight around the object.A confidence score is computed for each triangle of the mesh.In this confidence score position consistency, normal consistency, and front parallelism are considered.Searching for NBVs is done plane by plane, from high to low altitude to speed up the process by coarsening the search space from 5D to 4D.In each iteration, an obstacleaware A* algorithm is used for optimal path planning, which connects the NBVs starting from UAV's current position.The main novelty of this approach is using on-the-fly incremental SFM for local and per-vertex updating the mesh model.However, it highly relies on GNSS signal availability which is not always the case, especially in dense urban areas and close to the large objects.Koch et al. (2019) proposed a semantic-aware exploit-and-explore flight planning approach to define inadmissible airspaces for safe trajectories.The main contribution of this research lies in employing semantic information of the exploratory flight for extracting the target object and generating a semantically-enriched coarse proxy for defining free and occupied airspace and avoiding prohibited flight zones.After sampling a large number of viewpoints on the bounding box of the extracted object of interest, the normalized distance-weighted mean of all visible 3D points is utilized to compute the orientation of each viewpoint candidate.Distance-based and observation angle-based gains coupled with camera constellation hemisphere slices on each object point are used as heuristics to predict the suitability of the viewpoints for the reconstruction.Inspired by (Hepp et al., 2018b;Roberts et al., 2017), greedy submodularity optimization is utilized to add the viewpoint with the highest additive information reward to the output set.Since a prior model of the object is available, model-based methods are able to compute the reasonable poses that could be used to create a complete and accurate 3D reconstruction.Employing various objective functions which include SfM and MVS heuristics, accuracy and completeness are involved explicitly in selecting optimum viewpoints.Most of these approaches use monocular images and tackle the problem of viewpoint and path planning separately i.e. after computing the effective viewpoints, path optimization will start to compute an efficient solution to approach the computed poses.As Table 2 shows, most model-based methods are offline.An interesting direction could be using a fast online SfM and MVS system to enable on-the-fly computations.
To the best of our knowledge, all model-based approaches use uniform sampling of the proxy for further inspections or use a uniform discretization of the environment as 3D representation of the object of interest.Investigation of the results of the current approaches reveals that most of the large errors lie at highly curved parts and also sharp corners of the objects.We suggest considering an adaptive sampling of the objects where the overall density depends on the desired LoD.However, the density should be also adaptive to local curvature which leads to more samples on the high curvature areas and geometric boundaries.
The Static environment is the prerequisite of the most MVS algorithms (Mostegel et al., 2016).Therefore, detecting and considering moving objects during data capturing is very important.However, this issue is widely ignored by most planning and image acquisition approaches.Another limitation is low-texture areas and building parts with different materials.In some cases, especially for buildings, there might be some low-texture images that can't reconstruct the 3D model of the area or object.None of the above-mentioned model-based methods consider neither moving objects nor low-texture images for viewpoints planning.
Although some researchers have already employed Google Map and satellite images for generating geometric proxy, with the recent advancements in the concepts and realization of BIM, digital twin and smart cities, the 3D rough model of the large structures like buildings and bridges are going to be more available.Even LoD2 information is publicly available in some countries nowadays1,2 that could be used as a-priori information in viewpoint and path planning algorithms.

Evaluation strategies
Evaluation of viewpoint and path planning methods for UAV-based 3D reconstruction is a challenging task which poses many significant challenges (Debus and Rodehorst, 2021;Hepp et al., 2018b;Koch et al., 2019).First, the important criteria in various communities and projects are different.For example, while visual fidelity and completeness are important in some cases, accuracy is a key in some other applications.Moreover, in some projects and environments, overall flight time is critical.Second, environmental conditions such as lighting, weather conditions, and surrounding objects may change.Moreover, moving objects and modifications of the object itself that can occur over the course of time can affect the evaluations.Furthermore, ground truth information is typically not available (Debus and Rodehorst, 2021;Hepp et al., 2018b;Koch et al., 2019).Hence, in most studies, the evaluation of the real datasets and scenes are restricted to the qualitative verification.Quantitative evaluations are usually possible and performed on synthetic datasets (cf.section 3.2) or specific dataset is used which make the fair and standard comparison of different approaches almost impossible.Last but not least, in most evaluations the quality of the generated 3D data (e.g. point cloud) is investigated.This indicates the performance of the whole pipeline of planning, data capturing and 3D reconstruction (Figure 7), which is the non-separable mixture of these three steps.The interested reader can refer to (Seitz et al., 2006) and the leaderboard of (Knapitsch et al., 2017) for a comparison of the reconstruction approaches.It should be accentuated that even in the case of perfect planning, the effect of hardware and environment could affect the quality of the acquired images.
Figure 7: General pipeline of planning, data capturing and 3D reconstruction Therefore, in this section we first mention the quality measures for the whole pipeline and then investigate measures that indicate the quality of viewpoint and path planning separately.

Evaluation of the whole pipeline:
Denoting the ground truth as  and reconstruction result as , we can compute the following quality measures: • precision (how much of  is supported by ) • recall (how much of  is correctly modeled by ) • F-score (Harmonic mean of recall and precision) • accuracy (how close  is to ) Generally, both  and  could be a mesh or point cloud.However, considering  as point cloud is better (avoiding surface reconstruction and its errors) and more convenient (avoiding sampling mesh at its vertices (Seitz et al., 2006)).If  is delivered as a mesh or CAD model, it is also better to use it directly to avoid the approximations in resampling it to a point cloud.Nevertheless, if  is itself a point cloud (e.g.captured by laser scanners) we can directly use it in the point-based evaluations.Following the similar notations as in (Knapitsch et al., 2017) and (Seitz et al., 2006), the quality measures could be expressed as follows: Precision (Correctness): if the distance between a reconstructed point  ∈  and the ground truth is  → = min ∈ ‖ − ‖, the precision of the reconstruction for any cut-out distance  is defined as follows: where [⋅] ∈ {0,1} is the Iverson bracket and |⋅| is the cardinality of the set and () ∈ [0,1].Selecting the proper value for  depends on the application and affects the results.For example, (Hepp et al., 2018b) considered  = 10  which may not be acceptable for some high quality reconstruction purposes.F-score: summarizes the precision and recall in a single score as:
Accuracy: Various metrics using unsigned and signed histograms of distances from  to  could be used to measure the accuracy of the reconstruction.While mean and standard deviation of distances could be used (Huang et al., 2018), the median of the distances is more robust to outliers.Another statistic which is suggested in the literature (Seitz et al., 2006) is the distance  such that % of the points on R are within distance  of G. Median is a special case of this measure  = 50.In (Seitz et al., 2006)  = 90 is used for computing the accuracy.
The main benefit of using signed distances compared to unsigned distances is that it gives a better sense of where and in which directions the reconstructed surface deviates from the ground truth (Maboudi et al., 2018;Seitz et al., 2006).To this end, one can visualize the distribution and inspect the trend of the signed deviations.Moreover, using unsigned distances changes the distribution of errors.In the case of normally distributed signed errors, the unsigned errors follow the folded normal distribution (Tsagris et al., 2014), whose parameters are different from the normal distribution's parameters.
To the best of our knowledge, there is no established framework for the evaluation of viewpoint planning, directly.The most common way is to report one of the metrics mentioned in the previous section as a function of the number of images for the 3D reconstruction of the object (Huang et al., 2018;Kuang et al., 2020;Peng and Isler, 2019;Smith et al., 2018).However, since these metrics are a function of both viewpoint planning quality and 3D reconstruction quality, these strategies cannot evaluate the viewpoint planning separately.The topic of establishing an evaluation pipeline and proper quality criteria is discussed in (Debus and Rodehorst, 2021).Based on the proposed pipeline, the authors evaluated some 3D UAS flight path planning algorithms.This pipeline focuses on pre-planned flight paths that have been computed based on available rough 3D models and is not transferable to planning in unseen and unknown environments that need to be explored.
Path planning quality which aims at minimizing the mission duration is also highly correlated with the number and distribution of viewpoints.Therefore, reporting mission duration as a function of measures like recall and precision (Kuang et al., 2020) depends also on view planning strategy.However, using the same viewpoints, path planning quality could be evaluated by measuring the flight path length and data capturing time.UAV battery duration and the speed at which the images can be captured are the main constraints that should be considered in path planning (Smith et al., 2018).We summarized the evaluation measures reported in the literature in Table 3.For the sake of brevity, just model-based approaches are listed.

Geometric measure
Efficiency measure (Hoppe et al., 2012) Precision: C2M There are also some attempts to combine different quality metrics to provide a unified score for both viewpoint and path planning steps to make different approaches comparable.In (Debus and Rodehorst, 2021) an evaluation pipeline is introduced which employs a normalized score for weighted combination of viewpoint and path planning quality as: where Ψ is the quality factor of the flight path,  is the cost of the flight path length,  is the surface area of the 3D model, and  is the distance between camera and object.However, there is no optimal value for this normalized score and it could be used just for a relative comparison of the planning approaches.Moreover, some hyper-parameters should be added to equation (4) to tune the importance of viewpoint and path planning quality measures for different applications.

Outdoor 3D reconstruction benchmarks
Preparation of reliable ground truth for the whole scene is not a straight-forward task.Usually, more than one sensor should be utilized (Bouziani et al., 2021) to capture the entire object with the accuracy of at least one level of magnitude better than the data under investigation.Therefore, many researchers perform the qualitative evaluation on real datasets, while quantitative evaluation is performed on synthetic datasets (Hepp et al., 2018b;Yan et al., 2021).
Unreal Engine (UE1 ) is a game development engine that produces photorealistic scenes.UE is by far the most common simulation environment which is used in many researches in viewpoint and path planning field (Hepp et al., 2018b;Kuang et al., 2020;Peng and Isler, 2018;Roberts et al., 2017).An open-source Python library called UnrealCV (Qiu et al., 2017) could also be used to set the camera parameters and control a virtual camera in the simulated environments (Peng and Isler, 2019;Roberts et al., 2017).Since there is full control over the camera parameters, objects (like buildings) and also lighting and texture it is possible to create many different scenes and reconfigure their attributes that is a very useful for evaluation purposes.Smith et al. (2018) created a benchmark which consists of 34 uniquely different buildings in five large urban scenes and packaged them within a release project of UE, which is publicly accessible via their project website2 .The authors developed a new synthetic benchmark dataset and simulation environment for UAV path planning problem in urban environments making it possible to quantitatively evaluate other viewpoints and path planning algorithms.A small part of one of synthetic samples from this benchmark is illustrated in Figure 8a.Microsoft AirSim (Shah et al., 2017), an open-source UAV simulator built on Unreal Engine, is also employed in the literature to simulate the flight process and plan the UAV flight path (Arce Munoz, 2020;Kuang et al., 2020).
Unity game engine3 is another environment that is employed in (Sharma et al., 2019) to create a 3D scene (see Figure 8b).Scripting capabilities of the game engine provides the possibility to capture the images using a virtual camera.An experimental Unity release of AirSim is also announced4 .In some studies like (Koch et al., 2019) the authors generated their synthetic dataset (see Figure 8c.) using other environments e.g.open-source computer graphics software Blender (Blender_Online_Community, 2018) which is a well-known 3D modeling and rendering package.3D architecture models from the3D Ware-house of SketchUp5 are also used (Huang et al., 2018).The authors used the Gazebo robot simulator6 for simulation and controlling the platform.Terragen8, a scenery generator program, is another environment employed in the literature (Liu et al., 2019;Martin et al., 2015).Recently, the concept of an autonomous environment generator for UAV-based simulation based on machine learning algorithms is introduced in (Nakama et al., 2021).Based on satellite images, this approach introduced the concept of procedurally generating, scaling, and placing 3D models to create a realistic environment.The interested reader can refer to (Hentati et al., 2018;Nakama et al., 2021) for more details on simulation environments.

4
Discussion and future trends Although much work and progress are observable in the field of viewpoint and path planning, existing algorithms are far from the last word on this topic, and there is plenty of exciting work left to do.In this section, we discuss the current challenges in viewpoint and path planning approaches which show the direction for future research on this topic.
Most of the model-free approaches rely on NBV and rapidly traverse along the direction that decreases the model uncertainty without using any prior information about the scene and object.Although this is very efficient for fast exploration of the objects, their local search strategy within a non-linear objective function may get stuck in local minima.It cannot guarantee a complete coverage of the object, especially its details.Furthermore, most model-free methods use stereo cameras or depth cameras to capture data for 3D reconstruction.It means the platform should get close enough to the object.Although it is feasible for indoor applications and small objects, it could be very time-consuming or even impossible for large-scale 3D reconstruction purposes.On the other hand, having a-priori knowledge of the general geometry enables the explore-and-exploit methods to optimize the coverage and accuracy of the results globally leading to high-quality results and smoother trajectories.However, most of the existing explore-and-exploit approaches are off-line and do not update the computed poses based on the online feedback from the data capturing unit.Therefore, considering the current computational power of on-board processing units, refining the global design of the poses and updating them locally based on the feedback from the acquired images could be a very interesting direction for future research.Nowadays, this idea could also be more realistic with the availability of very fast data transfer technologies like 5G and the popularity of cloud-based processing.This also could close the gap between two flight missions (exploration and exploitation) which requires two visits from the site, which is not always possible in practice.
Like almost all other projects in engineering, one interesting direction could be as-planned vs. as-is investigation.Even in the case of a perfect planning, many other issues could affect the final outcome.Hardware limitations, meteorological conditions, objects complexities, and sensor-object relative conditions are the main factors that can greatly increase the uncertainty of the motion and state of the MAV or the quality of the final product.GNSS hardware and setup is an example in this regard which affects the data capturing quality.Although GNSS provides valuable information for UAV-based data capturing and processing, when the platforms fly at a low distance to the object of interest or under the structures like bridges, the GNSS signal can be strongly disturbed.Therefore, integrating visual SLAM and GNSS-based navigation and data capturing could be an interesting topic to have the best of both worlds.Drift on the gimbal and the magnetometer compass affects capturing the data with the already planned angles.Furthermore, due to safety issues e.g., minimum controllable flight height, some lower parts of the structures cannot be well captured and reconstructed (Kuang et al., 2020).To the best of our knowledge, most of the current approaches are designed for ideal conditions and neglect the effect of wind and other meteorological conditions.For example, wind affects the platform's ability to reach the planned viewpoints.Some studies (Kim and Eustice, 2015;Papachristos et al., 2019b) addressed the problems occurring by fast motion of the platform by considering the localization uncertainty of SLAM for view planning.They integrated exploration or coverage planning with an active SLAM approach, which aims to minimize the localization uncertainty of SLAM.They produced navigation paths tracking many SLAM features to reduce localization uncertainty.However, although their methods improved the localization accuracy, MAVs are still unable to navigate quickly because the generated paths do not satisfy MAV's dynamic properties.The integration of the active SLAM approaches (Kim and Eustice, 2015;Papachristos et al., 2019b) and the fast motion generation methods (Cieslewski et al., 2017;Dharmadhikari et al., 2020) could be another direction for future work.Since battery limitation (flight time) is a challenge for almost all systems, multi-agent systems like (He et al., 2021) seem to be very effective for large-scale 3D reconstruction purposes.However, current challenges like efficient and reliable tasks distribution between the agent are still challenging.Besides all deviations caused by hardware imperfection and environmental conditions, the appearance of the objects with different materials can also affect the quality of 3D reconstruction.Shadow and illumination are also very important factors that can hinder reaching a high-fidelity 3D model.Reflective or low-texture surfaces like glass should also be considered.Moreover, since the texture is related to scale, it can pose another constraint on viewpoint planning which affects the acceptable distance between the sensor and low-texture parts of the objects.Hence, deviation of as-is poses of the captured images from as-planned ones necessitates an iterative closed-loop and online feedback between data capturing and planning modules that could be another direction for future research.
Since the perception of the surrounding environment is a key for collision-free UAV flight, commercial UAVs are equipped with some algorithms/sensors to perform a safe flight.However, when dealing with dynamic scenes (which is the case in many real scenarios), other considerations like moving objects detection also comes into play and should be considered in data capturing.Recently, (Tullu et al., 2021) employed a YoLO object detector (Redmon and Farhadi, 2018) to find obstacles and pedestrians in UAV's navigation environment.This would also be important to avoid any occlusion caused by moving objects between the sensor and the object of interest.
While some algorithms prioritize reconstruction quality, others emphasize the flight time (equivalently trajectory length or the number of images).Future research could also concentrate on finding reasonable and flexible solutions that can switch between these two possibilities or even add some hyperparameters for tuning the importance of 3D reconstruction quality and the trajectory length in order to fulfill the requirements of different applications.

AI and Machine learning-based approaches:
Some studies recently applied machine learning techniques for viewpoint and path planning.Hepp et al. (2018a) proposed an NBV planning method where the viewpoint's utility is computed from a 3D convolutional neural network (CNN).The method used a multi-scale voxel representation of a partially explored scene as an input of the CNN.It trains utility scores of viewpoints from an oracle with access to ground truth information.Peralta et al. (Peralta et al., 2020) also proposed a learning-based NBV planning method for scanning houses.They provide a dataset of 3D house models for training NBV policies.The Authors trained a deep Q-network (Lillicrap et al., 2016) and deep deterministic policy gradient (Lillicrap et al., 2016) based on the house dataset.Zeng et al. (2020) proposed a deep-learning network for NBV planning, which directly uses raw point cloud data instead of a volumetric model.The network extracts the feature of partially reconstructed point cloud and predicts the information gain of viewpoints from the extracted feature.This network can estimate an information gain more efficiently than the conventional ray-casting approaches.An exploration planning method based on deep reinforcement learning is proposed in (Zhu et al., 2018).This method trains topological information of an office-like environment, which provides a guide to efficiently compute the visiting sequence for unexplored regions.Martin et al. (2016) proposed using genetic algorithms to optimize viewpoint planning for accurate 3D reconstruction using small UAVs.The coverage and accuracy of the generated model are formulated in the objective function of the Genetic algorithms.The objective function is defined based on the number of visible terrain points in the images of a solution set and the need to capture terrain points from multiple angles.The evaluation results show the improvement in the completeness and standard deviations of case studies in comparison with the basic grid survey.A hierarchical framework for path generation, coverage path planning, and dynamic obstacle avoidance is presented in (Lei et al., 2022).Satellite images and maps of farms were used via a deep learning method for path planning.A faster R-CNN network (Ren et al., 2017) localizes and identifies objects and obstacles of humans and vehicles.Xie et al. (2021) proposed a deep reinforcement learning approach for path planning in a dynamic environment.They formulated this issue as a Partially Observable Markov Decision Process and solved it by utilizing the local information and relative distance without global information.Historical state-action sequences and multi-sensors are used by the proposed method to achieve more reasonable decision-making.Although there is a rich body of literature on AI-based path and coverage planning (Aggarwal and Kumar, 2020;Kaba et al., 2017;Pehlivanoglu and Pehlivanoglu, 2021;Sonmez et al., 2015;Zhang et al., 2021;Zhao et al., 2018;Zhou et al., 2021), viewpoint planning for 3D reconstruction based on AI and machine/deep learning approaches are still in its infancy and there is a large room to deepen the insight into this topic.

Figure 1 :
Figure 1: SfM and MVS heuristics in viewpoint planning: a) Parameters, b) co-apex cones for forcing the suitable cameras to adhere to the heuristics, c) considering the segments of a hemisphere for viewpoint planning(Koch, 2020), and d) MVS-aware coverage model(Roberts, 2019)

Figure 2 :
Figure 2: Overall structure of explore-then-exploit methods

Figure 4 :
Figure 4: The adaptive viewing rectangles for different clusters of the scene patch(Peng and Isler, 2019)

Figure 5 :
Figure5: Bi-level graph-based path planning in(Kuang et al., 2020) (Completeness): Similarly, for a ground-truth point  ∈ , its distance to  is defined as  → = min ∈ ‖ − ‖ and the recall of the reconstruction  for any cut-out distance  is () ∈ [0,1] and is defined as:

Figure 8 :
Figure 8: Three examples of synthetic environments: a) Part of one of the synthetic samples from(Smith et al., 2018) left part of the scene is shown after texturing, while the right side is shown before texturing.b) A series of building in Unity engine 7 .c) synthetic dataset created in(Koch et al., 2019) using Blender.

Table 1 :
Categorization and main features of model-free view-path planning methods

Table 2
summarizes the main characteristics of discussed model-based approach.

Table 2 :
Main characteristics of discussed model-based approach.

Table 3 :
Evaluation measures reported in the model-based studies