Combining Nadir, Oblique, and Façade Imagery Enhances Reconstruction of Rock Formations Using Unmanned Aerial Vehicles

Developments in computer vision, such as structure from motion and multiview stereo reconstruction, have enabled a range of photogrammetric applications using unmanned aerial vehicles (UAV)-based imagery. However, some specific cases still present reconstruction challenges, including survey areas composed of steep, overhanging, or vertical rock formations. Here, the suitability and geometric accuracy of four UAV-based image acquisition and data processing scenarios for topographic surveying applications in complex terrain are assessed and compared. The specific cases include the use of: 1) nadir imagery; 2) nadir and oblique imagery; 3) nadir and façade imagery; and 4) nadir, oblique, and façade imagery to reconstruct a topographically complex natural surface. Results illustrate that including oblique and façade imagery to supplement the more traditional nadir collections significantly improves the geometric accuracy of point cloud data reconstruction by approximately 35% when assessed against terrestrial laser scanning data of near-vertical rock walls. Most points (99.41%) had distance errors of less than 50 cm between the point clouds derived from the nadir imagery and nadir–oblique–façade imagery. Apart from delivering enhanced spatial resolution in façade details, the geometric accuracy improvements achieved from integrating nadir, oblique, and façade imagery provide value for a range of applications, including geotechnical and geohazard investigations. Such gains are particularly relevant for studies assessing rock integrity and stability, and engineering design, planning, and construction, where information on the position of rock cracks, joints, faults, shears, and bedding planes may be required.


I. INTRODUCTION
T OPOGRAPHIC surveys are essential elements in supporting a range of civil engineering and geomorphological applications, including studies of erosion deposition and transport, structural stability and movement, and geohazard investigations [1]- [3]. Two decades ago, topographic Manuscript  data were predominantly collected using ground-based point sampling methods from differential global navigation satellite systems (GNSS) and total stations [4]. Since the introduction of laser scanning technologies, light detection and ranging (LiDAR) systems have become widely used to acquire topographic data [5]. However, both terrestrial and airborne LiDAR systems are expensive for repeated survey tasks, which are often required for applications where temporal dynamics are of interest. Furthermore, while terrestrial laser scanning (TLS) can provide very high spatial resolution surface models of high accuracy, the spatial coverage can be limited and also restricted by line-of-sight viewing requirements [6], [7]. Area accessibility can also pose a key constraint, particularly in hazardous or complex terrains. Traditional airborne laser scanning (ALS) can cover large areas within a relatively short time, with data acquisition specifications suited for detailed terrain analysis. On the other hand, ALS systems need significant expertise to overcome the complexity of the operation and still require line-of-sight viewing, and the cost of operation and equipment is usually much higher than for TLS operations for local-scale surveying tasks [6], [8]. Digital photogrammetry using optical remote sensing presents as a relatively cost-effective alternative to acquire topographic data. Using well-established techniques of multiview stereopsis (MVS), the elevation of the Earth's surface can be estimated [9]. Factors such as image spatial resolution, sampling density, interpolation methods, and optical sensor models can significantly affect the accuracy of the produced topographic data [9]- [11]. Therefore, high-quality image capture, rigorous analysis, and user-expertise are all required in the production of accurate topographic data, which have limited data provision to specialized consulting services in the past [8], [12]. In recent years, there has been dramatic growth in the use of low-altitude remote sensing platforms, particularly through the application of unmanned aerial vehicles (UAV) [13]. The installation of commercial-off-the-shelf sensors on these low-altitude aerial platforms enables users to acquire ultrahigh (sub-cm) spatial resolution imagery of the Earth's surface without the need for sophisticated instruments. They also overcome some of the challenges that can affect the remote collection of optical imagery, including variable atmospheric conditions (e.g., rainfall and cloud cover), via their capacity for rapid deployment [14]. This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ The use of ultrahigh spatial resolution UAV-based imagery has seen increased uptake in many photogrammetric applications due to the development of computer vision techniques, such as structure from motion (SfM) and MVS reconstruction [15], [16]. The combination of MVS photogrammetry and SfM methods provides the opportunity to utilize optical UAV imagery for 3-D reconstruction, as it increases the sampling density of photogrammetry with higher spatial resolution imagery. The SfM-MVS 3-D reconstruction can now be achieved irrespective of the predefined position, orientation, and optical model information of the on-board cameras, and as a result, generally increases the accuracy of subsequent products [10], [17]. Such techniques have seen application in numerous studies but are, perhaps, most commonly employed in providing highly accurate digital surface models (DSM) for topographic surveys via the implementation of proper camera calibration and ground control points (GCP) [3], [18]- [21]. Conventionally, the imagery used for DSM reconstruction is acquired at nadir, which is similar to traditional aerial photogrammetry. James and Robson [19] suggested that, by collecting oblique images (around 20 • off-nadir) in addition to nadir imagery, the accuracy of UAV-derived topographic data can be improved. However, in many natural landscapes, the survey area of interest may contain steep, overhanging, or vertical rock formations, which are unsuited for 3-D reconstruction from either traditional surveying or conventional UAV-based nadir and oblique sensor-viewing geometry.
The use of horizontal-viewing (façade) photographs in 3-D reconstruction is ubiquitous for disciplines, such as cultural heritage preservation and civil engineering [22]- [24]. Koutsoudis et al. [25] suggested that accurate 3-D modeling of artificial objects reconstructed from ultrahigh spatial resolution façade imagery can achieve similar levels of detail as other sensing methods, such as laser scanning. Man-made structures and objects are usually easier to reconstruct than natural landscapes since a range of discernible features improves the operation of computer vision algorithms [26]. In a recent application, Jalandoni et al. [27] used ground-collected façade photographs to successfully reconstruct ancient artwork on a vertical rock surface. The use of façade photographs has also proven useful for natural surface reconstruction although natural surfaces sometimes lack the distinct features required to take advantage of photograph-matching algorithms [28], [29]. James and Robson [12] used on-ground near-parallel façade photographs to reconstruct a coastal cliff section. The on-ground method was time-consuming and precluded the coverage of large areas. Therefore, acquiring façade photographs from a UAV offers an appealing alternative to increase efficiency and reduce potential data gaps [4].
The acquisition of façade or oblique UAV photographs is suitable for covering vertical walls, which are not accessible by survey personnel. Berquist et al. [30] used oblique and façade UAV imagery to build a 3-D model of an inaccessible rock cliff section, using ancient artwork on the cliff as reference points. Barlow et al. [31] acquired parallel façade photographs from a UAV to reconstruct a sea cliff to analyze rock structural discontinuities. Jaud et al. [32] assessed the impact of camera viewing angles on the reconstruction accuracy of a cliff and did not find significant differences between data sets acquired at 20 • and 40 • off-nadir. In their analysis, the completeness of data reconstruction between the data sets was similar, except for areas with extruding or overhanging cliff surfaces. However, in [19], the lack of other convergent images of a coastal cliff resulted in errors related to systematic radial distortion. Such errors were likely caused by the dome effect, which often appears in SfM-produced topographic data using nadir UAV image due to the nature of bundle adjustment [18]. Circular scanning of sites, with the normal vectors of camera positions converging to a pseudopoint, is a common strategy for UAV-based collection of oblique images from near-vertical objects [33]. Martínez-Carricondo et al. [34] used the combination of nadir and oblique images to improve the accuracy of topographic data from a vertical wall in the condition of inadequate GCP distribution. Such findings indicate the potential to improve the topographic accuracy and data completeness through combining UAV imagery acquired from various viewing perspectives, especially in complex terrain containing vertical surfaces.
Here, we explore the combination of UAV-derived nadir, oblique, and façade photography to assess potential advantages for 3-D reconstruction and modeling of complex mountainous terrain with vertical rock formations. While the above studies indicate the potential benefits of façade photography [23]- [25], [31], we were unable to identify published work demonstrating UAV-based data collection procedures and processing results for integrating nadir, oblique, and façade photographs in the complex terrain. As such, the overall objective of this study was to comprehensively assess and intercompare the suitability and geometric accuracy of UAV-based image acquisition and data processing scenarios for topographic survey applications of rock formations using: 1) nadir imagery; 2) nadir and oblique imagery; 3) nadir and façade imagery; and 4) nadir, oblique, and façade imagery. The outcome of this work has implications for civil engineering, geomorphological studies, and other disciplines requiring accurate topographic surveying, especially in complex terrains where conventional approaches are restricted due to accessibility and safety concerns. The research also provides information on the benefits of integrating off-nadir and façade photographs for 3-D modeling and visualization of vertical rock surfaces, which may be required for the assessment of geological structures to evaluate rock integrity and stability in the vicinity of infrastructure.

A. Study Site
The study site is located in the southwest of Saudi Arabia within the mountainous region of Aseer, covering an area of approximately 2.7 hectares. The orthometric elevation within the study area ranges from 1693-1823 m above mean sea level (see Fig. 1). The site contains many steep and vertical rock surfaces, with slopes varying from 0 • to 90 • . Thus, there is a requirement for repeated topographic surveys in order to estimate rock structural stability and identify potential safety issues to existing road infrastructure.

B. Data Collection and Processing
The project comprised three major data collection exercises, including the acquisition of GNSS survey data of GCPs, TLS   surveys of vertical rock walls, and UAV-based red-green-blue (RGB) imagery. Fig. 2 presents the workflow of the subsequent data processing, with further details of the individual processing steps provided in Section II-B.
1) GCP Deployment, Surveying, and Processing: A total of 30 GCPs were deployed for georeferencing, with 21 GCPs applied for the TLS data, and nine used for the UAV data [see (Leica Geosystems). Reference data from continuously operating reference stations (CORS) were used to accurately establish the position of the base station. The calculated position and ellipsoidal elevation of the base station were then used to postprocess the RTK data. The EGM96 geoid model was used to determine the geoid separation and applied to the ellipsoid heights for orthometric elevation determination. Results were tabulated using WGS84 as the reference datum.
2) TLS Data Acquisition and Processing: Seven TLS scans were performed using a FARO Focus X330 laser scanner (Faro Technologies, Warwickshire, U.K.) deployed on a 1.03-m tripod. Triplets of GCPs were located on the ground within 5 m of each scan location to facilitate accurate georeferencing [see Fig. 3(a)]. Each scan was performed at a distance of approximately 25 m from the rock façade. The scan resolution was 0.03 • with a scan rate of 112 kpt/s. The angular area of each scan was 90 • to −90 • horizontally and 90 • to −62.6 • vertically. A photograph was taken at the end of each scan to enable rendering of the point cloud during postprocessing. The postprocessing of the TLS scans was undertaken within the Faro SCENE 5.3 software. Each scan was imported into SCENE and processed separately to generate multiple grey-scale point clouds and subsequently converted into colored point clouds based on the RGB photographs collected at the end of each scanning process. The GCPs were manually marked on each colored point cloud for georeferencing. Cloudto-cloud registration was applied to stitch the seven separate point cloud chunks into one integrated point cloud as a reference to the rock façade. The result was exported for comparison against the UAV-derived SfM-MVS point clouds.
3) Flight Planning and UAV Image Acquisition: The UAV imaging system consisted of a gimbal-stabilized 20MP Hasselblad L1D-20c camera (Victor Hasselblad AB, Gothenburg, Sweden) and two DJI MAVIC 2 Pro quadcopters (SZ DJI Technology Co., Ltd, Shenzhen, China). The flight planning and ground controls were carried out using the universal ground control station (UgCS) application (SPH Engineering, SIA, Riga, Latvia). The flight campaign consisted of three nadir-, three oblique-, and six façade-viewing flights. While all nadir-viewing flights and five façade scans were preplanned for autonomous flight missions, one façade scan and three oblique scans of the rock wall were flown manually due to safety and potential collision concerns. We collected oblique and façade scan images in the mornings and afternoons from sunlit rock walls to reduce shadowing effects, whereas nadirviewing images were collected within two hours of solar noon. Table I shows the parameters of the different flight configurations. The distance between the sensor and the terrain surface was not consistent due to the extreme terrain variation. The three nadir-viewing flights were flown at various heights (1850, 1860, and 1910 m) above mean sea level to ensure safe clearance of the terrain. It should be noted that the camera viewing elevation of façade flights was set to 4 • to avoid the influence of the propellers.

4) UAV Image Processing:
The collected UAV images were organized into four data sets: 1) nadir imagery (1287 photographs); 2) nadir and oblique imagery (1932 photographs); 3) nadir and façade imagery (3297 photographs); and 4) nadir,  [35]: particularly, the error reduction methods to ensure data accuracy. The coordinates of the photographs were converted from a WGS84 geographic coordinate system to a UTM projected coordinate system. The predefined camera model was checked before the initial photograph alignment. Photograph alignment was performed at the original scale, with a tie point limit of 4000 and a key point limit of 40 000 points. Once the photograph-alignment was completed, error reduction was performed based on reconstruction uncertainty and projection accuracy to remove those tie points with an accuracy below the USGS-suggested thresholds without omitting any aligned photographs. Values of ten for reconstruction uncertainty and two for projection accuracy were defined as the thresholds for error reduction at the initial phase. The GNSS-derived GCP coordinates were subsequently imported and manually registered to the GCPs visible in the UAV images, followed by another error reduction based on the reprojection error. The remaining tie points with an accuracy below the USGS-suggested threshold of 0.3 pixels were removed without omitting aligned photographs. This enabled the completion of the SfM-derived sparse point cloud. The root-mean-square errors (RMSEs) for data sets 1 (nadir), 2 (nadir and oblique), 3 (nadir and façade), and 4 (nadir, oblique, and façade) based on the GCPs were 1.55, 0.89, 1.84, and 1.37 cm, and the projection errors were 0.4, 0.49, 0.56, and 0.6 pixel, respectively. MVS dense clouds were then generated using the original scale (ultrahigh-resolution) and aggressive filtering for all data sets. Any visually identified noise in the dense point clouds was subsequently removed. Finally, the results were exported for comparison with the TLS-derived point cloud data. The point density of the data sets 1, 2, 3, and 4 was 0.13 points/cm 2 , 0.16, 0.19, and 0.2 points/cm 2 , respectively. The produced SfM-MVS point cloud of each data set was then used to generate a DSM with a Poisson surface reconstruction method [36], followed by the generation of an orthomosaic based on the DSM and the mosaic blending mode.

5) Accuracy Assessment:
The accuracy of the UAV-derived point clouds was evaluated against the georeferenced TLS-derived point cloud for the rock façade using the Cloud-Compare 2.12 alpha version [37]. The first step was to crop the data with the polygon of the study site to ensure consistent data volumes. Because the seven TLS scans were merged as one integrated cloud, geometric errors within each point cloud may be propagated during the process [38]. Therefore, the iterative closest point (ICP) fine registration algorithm [39] was used to calculate the rigid body transformation matrix to fit the TLS data to the UAV data sets (1). ICP fine registration assumes that the two comparing point clouds are roughly aligned and the shapes in the overlapping area are the same and only transform points at a fine scale. The rigid body transformation matrix has the same unit as the multiplying coordinate vectors. The multiplication of such a matrix and the multiplying coordinate vectors (easting, northing, height, and scale in a single column) results in the transformed coordinate vectors, which maintains the original unit (m). The transformation matrix was eventually applied to the TLS data before the comparison with each UAV-derived point cloud to minimize the geometric error between the TLS data and each UAV data set.
Each finely registered TLS data set was then compared with the corresponding UAV-derived point cloud data. Because the TLS data had smaller data coverage than the UAV-derived point cloud data, it was set as the comparing data rather than the reference data in CloudCompare due to the restriction of the comparison algorithm employed by the software. Nevertheless, the magnitude of the resulting cloud-to-cloud alignment errors was the same regardless of which data set was assigned as comparing and reference data, and hence, the results provided an indication of the errors of the UAV data sets when evaluated against the TLS data. For every comparing point from the TLS data, a local surface model was calculated using a least-squares method with the nearest six points from the corresponding UAV-derived point cloud data. The absolute Euclidean distance between each point from the TLS point clouds and the corresponding local least-squares plane was calculated to determine the geometric accuracy of the rock façade within the four UAV-derived point clouds based on the nadir; nadir and oblique; nadir and façade; and nadir, oblique, and façade imagery, respectively.
Subsequently, the UAV-derived point cloud, which had the lowest geometric error, was selected as the reference point cloud for relative comparison with the three remaining UAV-derived point clouds. Such comparisons further assessed the point cloud accuracy for those parts of the study site that was not covered in the TLS-based point clouds. The comparison method for the UAV-derived data was the same as that used for the TLS data, except no ICP fine registration was applied before the comparison. The absence of ICP on UAV-derived data is because all the UAV-derived data were georeferenced based on the same GCPs, and no further geometric manipulation of the UAV-derived point clouds was applied after GCP registration. In addition, the raster DSM data of each UAV data set were used to subtract the DSM of the selected UAV reference data set to more intuitively demonstrate the elevation differences, a process that was performed using QGIS [40]. Finally, the orthomosaic of each UAV data set was visually compared with the orthomosaic of the same UAV reference data set to check for significant spatial offsets.

A. Point Cloud Comparison Between the TLS and UAV-Derived Data
The collection of TLS data enabled the coverage of the lower sections of the rock formation up to a height of approximately 70 m [see Fig. 4(a)]. The scanned area was roughly divided into three parts, as shown in Fig. 4(b), with areas representing the east-facing façade, south-facing façade, and top of the rock formation. These defined regions are used to inform the analysis presented in Sections III and IV. The coverage of the TLS data was concentrated on the eastand south-facing façade sections due to the reduced line of sight. This is particularly obvious in the central area (where sections i and ii join) and at the top of the rock formation (section iii) due to the near-vertical rock wall obstructing the view of the area above. In the lower sections, some spots were also omitted in the TLS point cloud due to obstructing rocks and the fact that only seven scans (i.e., points of viewing) were used to create the point cloud. In contrast, the UAV imagery consists of thousands of viewing points, providing a more complete point cloud of the area of interest [see Fig. 4(b)].
The rigid body transformation matrices, which were applied to the TLS data before each comparison, are presented in the following. Transformations were necessary to minimize the influence of the geometric propagation errors caused by the merging of multiple TLS scans. They also ensure that the derived cloud-to-cloud errors between the TLS data and UAV point clouds could be meaningfully interpreted. The meaning of each element in the matrix can be found in (1), as shown at the bottom of the page.
The rigid body transformation matrix to fit the TLS point cloud to UAV data set 1 (nadir images) was ⎛ The transformation matrix to fit the TLS point cloud to UAV data set 2 (nadir and oblique images) was ⎛ ⎜ ⎜ ⎝ The matrices show that the horizontal offsets (the combination of the X shift along the x-axis and the Y shift along the y-axis) were two orders larger than the vertical offsets (the Z shift along the z-axis) in UAV data sets 1 and 4 and at least one order larger in UAV data sets 2 and 3. This may be due to the TLS data being concentrated on the façade parts of the rock formation and hence causing a lack of discernible features on the horizontal plane. The lack of horizontal features increased the geometric propagation errors in the horizontal directions during the cloud-to-cloud alignment process when merging the TLS scans. Both the Y shift along the y-axis (-15.011 m) and the Z shift along the z-axis (-0.409 m) were an order larger in the transformation matrix (which fits the TLS data to UAV data set 2) than the other transformation matrices. This indicates that a larger transformation was applied to the TLS ⎛ ⎜ ⎜ ⎝ cos ϕ cos ψ + sin ϕ sin θ sin ψ sin ϕ cos ψ − cos ϕ sin θ sin ψ cos θ sin ψ X − sin ϕ cos θ cos ϕ cos θ sin θ Y sin ϕ sin θ cos ψ − cos ϕ sin ψ − cos ϕ sin θ cos ψ − sin ϕ sin ψ cos θ cos ψ Z where θ , ψ, and ϕ are the rotations around the x-, y-, and z-axes, and X, Y , and Z are the shifts along the x-, y-, and z-axes, respectively.
data to reduce the Euclidean distances to the point cloud of UAV data set 2. The larger transformation also means that larger relative errors in position exist between the TLS data and the point cloud of UAV data set 2. Therefore, we can expect that UAV data set 2 would have a larger difference in point locations compared with other UAV-derived data with lower transformation coefficients. The mean absolute point distance (point number ≈ 81.54 million) between the rigid-body-transformed TLS data and UAV data set 1 was 5.67 cm, with an 8.86-cm standard deviation. The mean absolute distance against UAV data set 2 was 5.74 cm, with a standard deviation of 6.80 cm. Although the mean absolute point distance of these two data sets was similar, UAV data set 2 was still considered to have a poorer geometric accuracy because the magnitude of the shift distance (the fourth column) in the transformation matrix was an order larger than the shift distance of UAV data set 1. The mean absolute distance against the point cloud of UAV data set 3 was the largest among the four data sets, with a mean absolute distance of 9.46 cm and a 10.48-cm standard deviation. Meanwhile, the difference between the rigid-body-transformed TLS data and UAV data set 4 was the smallest, with a mean absolute distance of 3.71 cm and a 5.43-cm standard deviation. Since the transformation matrices to fit the TLS data to UAV data sets 1, 3, and 4 were of similar magnitude for the data transformation, we can conclude that the transformed TLS data had the smallest geometric errors against UAV data set 4, which also means that the geometric errors between UAV data set 4 and the TLS data were the smallest among the four UAV data sets. The improvement in geometric accuracy of UAV data set 4 was found to be 34.57% compared with UAV data set 1.
The percentage of points that had a distance error of less than 50 cm was 99.56% (data set 1), 99.92% (data set 2), 99.47% (data set 3), and 99.96% (data set 4) for the respective point clouds. From Fig. 5, it can be seen that not only did data set 4 have the smallest distance error but it also had the most homogeneous error distribution within the area covered by the data [see Fig. 5(d)], indicating the similarity between the TLS and data set 4 point clouds. Among data sets 1, 2, and 3, data set 1 had smaller distance errors (blue) distributed more homogeneously than the other data sets, except for areas where protruding rocks appeared within the near-vertical façades. This emphasizes that the nadir-viewing imagery precluded accurate reconstruction of vertical objects. On the other hand, larger errors (green and red) were distributed more evenly on the south-facing façade [see Specifically, for vertical and overhanging rock sections, UAV data set 4 performed the best, which is illustrated in Fig. 6, where the point distance between the TLS point cloud and the point cloud of UAV data set 4 was visibly lower (<20 cm) than the point distance between the TLS and UAV data set 1 point clouds (up to 50 cm for overhanging rocks). Based on these  observations, it can be concluded that UAV data set 4, which comprised the nadir, oblique, and façade imagery, had the lowest errors when evaluated against the TLS data. As such, it was selected for use as reference data against which the accuracy of the other UAV data sets could be assessed.

B. Point Cloud Comparison Between the UAV Data Sets
Unlike the comparison between the TLS data and UAV data sets, no rigid body transformation was required for comparing the UAV-derived point clouds. Therefore, it was expected that the resulting distance errors between the UAV data sets would be larger than the errors observed in the previous section. The mean absolute distance between data sets 1 and 4 [see Fig. 7(a)] was 5.69 cm, with a standard deviation of 8.8 cm Fig. 7. Point distances between (a) UAV data sets 1 (nadir) and 4 (nadir, oblique, and façade), (b) data sets 2 (nadir and oblique) and 4, and (c) data set 3 (nadir and façade) and 4, with data set 4 used as the reference data set. The color ramp from blue to red represents the cloud-to-cloud absolute distance.  Table II). From Fig. 7, we can tell that the absolute mean distance between data sets 1 and 4 was generally less than 10 cm, while the geometric alignment errors were much larger between data sets 2 and 4, as well as 3 and 4. The comparison between data sets 2 and 4 exhibited the largest point distances along both the east-and south-facing façade sections but with smaller geometric offsets occurring in the top sections of the rock formation [see Fig. 7(b)]. The comparison of data sets 3 and 4 showed the largest point distances along the south-facing façade and the top of the rock formation, whereas the east-facing section of the façade only displayed minor offsets between points generally of less than 10 cm [see Fig. 7(c)]. This phenomenon may be caused by differences in the rock surface-to-camera viewing angle between the two façade sections. The east-facing façade had steeper terrain than the south-facing façade (see Fig. 1), causing the southfacing façade to have a larger angle between the viewing direction of the façade photographs and the normal vector of the façade surface. Smaller viewing-surface geometry causes smaller ground sampling distance (GSD) variation, which provides higher topographic survey quality [29]. Fig. 8 shows a zoomed-in area of the point distances focused on the east-facing façade, where the steepest slopes were located (see Fig. 1). As shown in Fig. 8(a), there were gaps in data set 1 (i.e., comprised of only nadir images), indicating a limitation in reconstructing the terrain in areas with large terrain fluctuations, or where overhanging rocks were present. Although adding either oblique images [see Fig. 8(b)] or façade images [see Fig. 8(c)] achieved near-perfect data completeness, the accuracy was less than satisfactory compared with the reference data, which integrated the nadir, oblique, and façade imagery.
To further explore the data, we calculated the least-squares linear regression between the absolute point distance and the terrain slope. The least-squares residual of each comparison was 9 cm between UAV data sets 1 and 4, 22 cm between data sets 2 and 4, and 32 cm between data sets 3 and 4. Overall, there was no linear relationship established between point distance and terrain slope identified for any of the data set comparisons. Extreme point distance values (mean value plus twice the standard deviation) occurred more frequently Fig. 9. Histograms of point distances within different ranges of terrain slopes of (a) UAV data set 1, (b) data set 2, and (c) data set 3. The y-axis represents the frequency of points is displayed on a logarithm scale. (10.06% for data set 1, 10.60% for data set 2, and 9.68% for data set 3) when the slope was between 85 • and 90 • . Fig. 9 shows the histograms of point distance in various ranges of terrain slope. The extreme point distances occurred more frequently on steep slopes, especially for the comparison of UAV data sets 2 and 4, where the maximum values of the extreme point distance increased up to 14 m [see Fig. 9(b)]. The occurrence frequency of points with large distances relative to those in data set 4 seems to increase significantly for data set 1 when the slope was higher than 75 • [see Fig. 9(a)]. However, the frequency of points with large distances relative to those in data set 4 for data sets 2 and 3 was not much different irrespective of terrain slope [see Fig. 9(b) and (c)].

C. DSM and Orthomosaic Comparisons Between UAV Data Sets
The DSMs produced from the point cloud of UAV data sets 1, 2, and 3 were also compared with the DSM produced from data set 4 by subtracting the DSM raster images from one another (see Fig. 10). Similar DSM elevation differences of more than −50 cm (blue) occurred at the top of the rock . Such consistent differences across all comparisons indicated that the DSM errors on the top of the rock formation may be due to the reference DSM of UAV data set 4 itself. Fig. 11 shows the DSM difference between UAV data sets 1 and 2 and indicates that the elevation differences on the top of the rock formation were significantly smaller, particularly compared with the DSM difference between either UAV data sets 1 and 4 or between UAV data sets 2 and 4 (see Fig. 10). As UAV data set 1 only comprised the nadir imagery and had the smallest projection error (0.4 pixels), the geometric accuracy of surfaces where the nadir imagery could easily observe should be higher, considering that the RMSE of tie points on GCPs was similar between UAV data sets 1 and 4. Hence, it may indicate that UAV data set 4 did not produce an accurate DSM for the top section of the rock formation, regardless of the smaller errors observed in the point cloud data of the façade sections [see Fig. 5(d)]. Overall, the DSM of UAV data set 1 was most similar to the reference DSM of UAV data set 4. Surprisingly, the DSM difference between UAV data sets 1 and 4 on the top section of the rock formation was higher (>50 cm) than the point distances shown in Fig. 7(a), which were around 30 cm, with only 0.59% of points in data set 1, exhibiting a distance of >50 cm compared with UAV data set 4. This may be because the surface reconstruction algorithm used by the software was affected by the small percentage of highly offset points, thus accumulated errors when producing the DSM.
All orthomosaics displayed a systematic issue of duplicating objects, as well as some minor offsets between data sets. Fig. 12 demonstrates a systematic issue causing duplication of objects at the lower part of the south-facing façade and on adjacent flat ground. This issue appeared in the same area in all the UAV data sets and required manual seamline correction. Fig. 13 shows minor offsets (range from 5 cm to 1 m) of specific objects between UAV data set 4 and the other UAV data sets. From the analysis in Section III-B, the horizontal distance between UAV data sets 1 and 4 was found to be smaller than the vertical distance. Yet, the opposite was observed between UAV data sets 2 and 4 and between UAV data sets 3 and 4. Only small horizontal distances were observed between the orthomosaics in the areas around the GCPs, exhibiting offsets of 5 cm [see Fig. 13(a) and (e)] to 10 cm [see Fig. 13(c)]. However, larger offsets were found in areas without GCPs [e.g., 20 cm in Fig. 13(b) and 50 cm in Fig. 13(d) and (f)]. The horizontal offsets found in the orthomosaics were larger than the horizontal distance in the point cloud alignment analysis, where the mean horizontal distance was 3.47 and 10.60 cm for UAV data sets 1 and 2, respectively. This may be due to the potential error accumulation during the production of DSMs, as the orthomosaic image products were built based on the DSMs rather than rasterizing the point cloud on a projected coordinate plane directly. Besides, a traditional mosaicking algorithm is likely to be more suitable for nearparallel image data. It is possible that the issues observed in the orthomosaics were due to stitching problems, where inappropriate perspectives of the images were chosen by the stitching algorithm (which can be fixed by manual seamline editing).

A. Influence of Photograph Composition on SfM in Complex Terrains
The inclusion of both oblique and façade photographs, in addition to traditional nadir images, generally improved the geometric accuracy of the point cloud data retrieved from complex terrain in the surveyed rock formation. Although the point positions between data set 1 (produced from only nadir images) and data set 4 (produced from the combination of nadir, oblique, and façade images) were similar for the façade sections (i.e., an overall least-squares residual of 9 cm), the additional oblique and façade photographs improved the geometric accuracy when evaluated against the TLS data, especially in areas with large terrain fluctuations and where overhanging rocks were presented [see Figs. 6 and 8(a)]. However, when adding either the oblique or façade images to the nadir photographs (data sets 2 and 3), the geometric accuracy worsened, especially in the south-facing façade area. Such a decrease in accuracy might be due to systematic radial topographic distortion caused by the bundle adjustment method used by the processing software. James and Robson [19] demonstrated that the bundle adjustment method can overestimate the surface terrain at the center of the site while underestimating the surface terrain at the site perimeters. Such radial distortion can be reduced by deploying GCPs at suitable locations [10]. However, there were no GCPs on the slope and façade to constrain the locations during the process of bundle adjustment, which was likely the main reason behind such high errors on the sloping and façade areas in data sets 2 and 3. To prevent such distortions with limited use of GCPs, previous studies suggested using predefined camera models and an accurate onboard GNSS system to reduce the errors in topographic data [10], [41]. The lack of transitioning images in data set 3 (i.e., images collected at a viewing angle in between the nadir and façade viewing angles) was likely making the geometric accuracy worse. The bundle adjustment method was designed to be more suitable for matching convergent images that have transitioning viewing angle differences [33]. A distinct change of the camera viewing angle of UAV photographs may affect the accuracy of the generated topographic data. Based on the results presented here, it is advised to use the combination of nadir, oblique, and façade images to prevent high topographic errors--especially when façade information is needed, e.g., for producing high-quality texture layers for the rendering of 3-D models suited for visual assessment or classification of features on steep or vertical rock walls, or if the geometric accuracy of façades is important, e.g., for slope stability assessment. Similar findings were also suggested in the survey of a dam when GCP distribution was inadequate [34].

B. Potential Influence of Surface-Viewing Geometry
The large angle between the viewing direction of photographs and the normal vectors of the terrain surface, especially the viewing directions of the façade photographs against the top section of the rock formation (approximately 70 • ), maybe another reason for the large geometric errors in the top section of the rock formation in UAV data set 4 (see Fig. 10). Furthermore, the distance of the UAV to the top section when collecting the façade photographs was >200 m. The large distance and surface-viewing geometry increase the GSD variation of the photographs and result in the poor-quality photograph, which decreases the geometric accuracy of SfM-MVS produced topographic data [29]. The large difference in point distances between UAV data sets 3 and 4 on the two different façade sections may also be due to the large viewing-surface geometry. The east-facing façade had a steeper terrain than the south-facing façade, with smaller surface-viewing geometric variation between the façade photographs and the terrain surface occurring at the east-facing façade, and thus, smaller point distances were observed compared with the point distances of the south-facing façade [see Figs. 7(c) and 10]. Two potential solutions exist. The first solution is to iteratively execute the error reduction steps until reaching the accuracy criterion, which might result in the omission of some photographs during the process. While omitting photographs during the error reduction steps could have significantly reduced errors and DSM elevation differences at the top section of the rock formation, removing too many photographs would introduce gaps in the produced topographic data because of insufficient image overlap. Hence, users must test how many iterations of error reduction are needed before compromising the accuracy at the expense of data completeness. The second solution is to restrict the façade photograph acquisition to capture only the façade parts of the terrain, as suggested in [32]. In this case, the 3-D reconstruction at the top of the rock formation would only consider nadir and oblique images. The smaller the GSD variation in the photographs, the higher the quality of the topographic data achieved by the MVS method [42].

C. Influence of Redundant Information From Input Photographs
When processing the UAV-derived point clouds, visible noise was removed to improve the quality of the data. In this study, one of the most significant noise contributors was introduced by the additional façade photographs and their inclusion of sky color. From this perspective, capturing only the façade parts of the terrain can prevent not only those errors caused by the large angle between photograph-viewing direction and the normal vector of the terrain surface but also the potential influence of sky color from façade photographs. Although most of the sky-introduced noise can be removed during the error reduction phases and the point cloud densification step (with the aggressive filter applied), the color of the sky was still rendered on the MVS dense cloud result, which requires manual removal. If the sky above the horizon of the upper sections of façade scans is included in the photographs, it is suggested to mask out unnecessary information (i.e., the section of the sky above the horizon) in the images before the photograph-alignment step in Metashape Pro [43]. Such a process could be semiautomated by applying a color filter to mask the sky in a large number of photographs [44]. Also, the difference in illumination conditions between the different flights and shadows may also affect the accuracy of SfM-MVS-generated point clouds [45].

D. Impact of Photograph Input Amount on Processing Efficiency
The ability to collect façade photographs to produce topographic data is mostly limited to local-scale analysis. Despite the potential accuracy improvements from the inclusion of oblique and façade photographs, the most significant drawback is the additional processing time due to the increased number of photographs. The machine that ran the image processing comprised of 56 virtual central processing units in 2.4-GHz Broadwell Intel Cores, four NVIDIA Tesla M10 graphics processing units, and 480-GB random access memory on an Ubuntu 16.04 system. It took 80 h (data set 1; 1287 photographs), 101 h (data set 2; 1932 photographs), 199 h (data set 3; 3297 photographs), and 290 h (data set 4; 3942 photographs) to complete the photograph matching, photograph alignment, depth map, and dense cloud generation. The processing time versus the input photograph number grew exponentially, similar to the findings of [46]. To compensate for this issue, the alternative is to process the MVS point cloud densification in multiple smaller chunks separately after the initial SfM tie points are created and the georeferencing and error reduction steps are completed. By splitting the original data into 3 × 3 chunks, the processing time of data set 4 was reduced by approximately two-thirds in the experiment. Despite these constraints, the incorporation of façade photographs can provide significant extra value to topographic data to accommodate various UAV applications [47].

E. Impacts of Photograph Inputs on Derived Raster Products
A factor that requires further research is the influence of the subsequent survey products, especially the production of the orthomosaic. In previous studies where façade photographs were included, the final products were usually either a point cloud, 3-D model, or an orthomosaic in a local planar coordinate system, rather than a real-world projected coordinate system [32], [47], [48]. Conventionally, the orthomosaic produced for topographic survey applications would be composed mainly of nadir aerial images with the normal vectors approximately perpendicular to the ground surface [43], [49]. Henriques et al. [50] suggested that factors such as the lack of features, the small coverage of each photograph, and areas full of similar elements can significantly reduce the accuracy of both the produced DSM and orthomosaic. In our case, each photograph covered a relatively small area due to the close range of the UAV to the rock surfaces and had similar repeating elements (e.g., rocks) appearing frequently within the study site. Therefore, it is likely that besides the accumulated errors found in the DSMs, the characteristic of repeating elements, as well as small spatial coverage of each photograph, caused difficulties for the image stitching algorithm to select the optimal orthorectified images from appropriate perspectives to generate an accurate orthomosaic. Tu et al. [51] suggested that flying at a higher altitude with a higher resolution camera to acquire a larger image cover area (as well as smaller GSD) could potentially increase the topographic data quality. However, photographs with ultrahigh spatial resolution collected at a higher altitude may cause higher GSD variation within scenes in such a scenario, as they may capture both local minimum and maximum elevations in a scene with fluctuating terrain. High variation in GSD could introduce propagation errors caused by reconstruction uncertainty [42]. Therefore, it is not clear if this principle is applicable in complex terrain. Further research is required to identify the causality between flight parameters and effects on derived UAV-based products (i.e., point cloud, DSM, and orthomosaic) in a common processing workflow to develop guidelines on the collection and processing of UAV photographs for topographic survey purposes in complex terrain.

V. CONCLUSION
An assessment of the impact on geometric accuracy of UAV data collected in the form of nadir, oblique, and horizontally viewing imageries in a complex mountainous terrain was undertaken. The inclusion of oblique and façade imagery, in addition to nadir photographs, improved the geometric accuracy of SfM-MVS point cloud data when assessed against TLS data of near-vertical rock façades. The nadir viewing UAV photographs produced the second-highest geometric accuracy, while the lowest geometric accuracies were obtained when processing nadir and façade photographs together in the same assessment against the TLS data. The poor geometric accuracy when combining nadir and façade photographs was related to the lack of transitioning images and the large angle between the photograph-viewing direction and the normal vector of the terrain surface. Users should avoid acquiring images with a large angle between the viewing direction of the photograph and the corresponding normal vector of the terrain surface and a distinct change of camera viewing direction, which may reduce the accuracy of the produced topographic data. Despite issues such as increased processing time and potentially more noise caused by unwanted information, such as sky color in the point cloud data (which can be addressed with alternative workflows), the accuracy improvement when integrating nadir, oblique, and façade photographs and the higher resolution façade details may provide additional benefits and application value to the produced topographic data. Applications that may benefit from the additional spatial resolution of rock walls and higher geometric accuracy include geohazard investigation, studies assessing rock integrity and stability, and engineering and construction designs relying on detailed information of geological structures, such as rock cracks, joints, faults, shears, and bedding planes. Future studies should focus on establishing an optimal protocol for the acquisition and processing of UAV imagery of complex terrain, particularly those comprising steep or near-vertical terrain elements, so that produced products can meet the requirements of topographic survey applications.
Yu-Hsuan Tu (Member, IEEE) received the Ph.D. degree in Earth and environmental sciences from the University of Queensland, Brisbane, Australia, in 2019.
Since 2019, he has been a Post-Doctoral Fellow with the King Abdullah University of Science and Technology (KAUST) Thuwal, Saudi Arabia. His research focuses on high-resolution remote sensing, UAVs, vegetation mapping, and spatial analysis.
Kasper Johansen received the Ph.D. degree in remote sensing from the University of Queensland, Brisbane, Australia, in 2007.
Since 2017, he has been a Research Scientist with KAUST, Thuwal, Saudi Arabia, where he is an Associate Editor of Frontiers in Remote Sensing. His main research interests are focused on processing and analyzing high spatial resolution remotely sensed multispectral, hyper-spectral, LiDAR and thermal image data acquired from satellite, airborne and UAV-based sensors to extract ecologically meaningful information.