Detecting Historical Terrain Anomalies With UAV-LiDAR Data Using Spline-Approximation and Support Vector Machines

The documentation of historical remains and cultural heritage is of great importance to preserve historical knowledge. Many studies use low-resolution airplane-based laser scanning and manual interpretation for this purpose. In this study, a concept to automatically detect terrain anomalies in a historical conflict landscape using high-resolution UAV-LiDAR data was developed. We applied different ground filter algorithms and included a spline-based approximation step in order to improve the removal of low vegetation. Due to the absence of comprehensive labeled training data, a one-class support vector machine algorithm was used in an unsupervised manner in order to automatically detect the terrain anomalies. We applied our approach in a study site with different densities of low vegetation. The morphological ground filter was the most suitable when dense near-ground vegetation is present. However, with the use of the spline-based processing step, all filters used could be significantly improved in terms of the F1-score of the classification results. It increased by up to 42% points in the area with dense low vegetation and by up to 14% points in the area with sparse low vegetation. The completeness (recall) reached maximum values of 0.8 and 1.0, respectively, when taking into account the results leading to the highest F1-score for each filter. Therefore, our concept can support on-site field prospection.


A. Airborne and UAV-LiDAR in Historical Landscapes
T HE detection and documentation of the remains of historical conflicts is of enormous relevance for their preservation and the retention of knowledge, the creation of commemorative landscapes and the understanding of their significance in historical contexts. This allows for a discussion of memory culture in order to address issues of practices of remembrance, which are an important component in the process of forming cultural heritage [1]. However, landscapes of memory are undergoing constant physical as well as discursive transformation for example due to anthropogenic land use, processes exacerbated by climate crisis or illegal excavations. This makes documentation The authors are with the Osnabrück University, 49074 Osnabrück, Germany (e-mail: marcel.storch@uni-osnabrueck.de; ndelange@uni-osnabrueck.de; thomas.jarmer@uos.de; bjoern.waske@uos.de).
Digital Object Identifier 10.1109/JSTARS.2023.3259200 of the historical remains even more important in order not to lose the cultural heritage irretrievably. In this context, conflicts of modern history, e.g., World War I and World War II, are becoming more important within the focus of investigations in conflict landscape studies, as the traces of these conflicts are increasingly threatened to disappear or being transformed beyond recognition. Documenting and investigating these historic sites of conflicts through an interdisciplinary approach is therefore relevant not only in order to broaden the perspectives on those landscapes, but also to preserve cultural heritage and to question and, if necessary, to refute narratives that may emerge over time [2], [3], [4].
In this context, many experts rely solely on on-site prospecting, however, fieldwork alone is often not sufficient since it is time-consuming and labor intensive. Remote sensing has the potential to provide spatially distributed information on land surface over large areas in near real time, and thus can help overcome these limitations [1], [2], [3], [4], [5].
A wide variety of sensor technologies has been used for this purpose over the last years [6]. Laser scanning technology [light detection and ranging (LiDAR)] has become one of the most significant sensing technologies in recent years when investigating landscapes where historical remains exist or are suspected to exist. The laser penetrates the vegetation cover to a certain extent and in this way can obtain information about the ground surface even in vegetated areas. This is particularly important in areas with historical remains. These are less affected by erosion processes if they are located below vegetation cover than in open terrain. Therefore, they are often better preserved in vegetated areas [7], [8], [9]. In the context of cultural heritage and historical remains, remotely sensed LiDAR data were used in various studies, e.g., for the identification of ancient cultural heritage like pit structures or cave entries [10], [11] or research on conflict-related battlefield sites [4], [12], [13].
Laser scanning from aircraft-mounted systems (often referred to as airborne laser scanning, ALS) is already widely used for archaeological and cultural heritage applications [6]. For several years, unmanned aerial vehicles (UAV)-based laser scanning has established itself in the remote sensing community. While this technology is already used in various environmental remote sensing studies [14], [15], [16], [17], it is rarely used in the context of archaeological and cultural heritage applications [5], [18]. Its systematic use with respect to the UAV flight parameters for scanning historical terrain features in different types This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ of vegetation was investigated [19]. In terms of achievable resolution, it should be noted that the absolute number of data points per square meter is significantly lower for ALS than for UAV-LiDAR. The latter can achieve point densities that are ten times higher than those of aircraft mounted laser systems [4], [19]. In addition, the laser footprint is much larger when using ALS (in the order of decimeters) than when conducting laser scans from a UAV (in the order of centimeters) [19]. Of course, this is also a measure of how detailed the Earth's surface and objects are captured by the system [20]. As a consequence, UAV-LiDAR data are becoming increasingly popular due to its higher spatial resolution and therefore a greater level of detail compared to ALS.
However, many studies in the context of cultural heritage and historical remains-including the abovementioned articles-rely primarily on visual interpretation of the data, which is a time-consuming and subjective task. The prior knowledge, level of expertise or interest of a specialist can influence the outcome of a manual interpretation [21]. Therefore, the main focus of this study is to propose a concept for automatic mapping of terrain anomalies in a historical conflict landscape using UAV-LiDAR data. The term terrain anomalies refers to man-made structures with regular shape. Specifically, this study focuses on terrain anomalies that are the result of historical conflicts, in this case dugouts created by soldiers, which are called foxholes (see Section II).

B. Background and Related Work
New challenges arise in order to automatically detect historical terrain anomalies using UAV-LiDAR: As a first step, it is necessary to perform ground filtering, which means to separate ground from nonground to create digital terrain models (DTM) from the LiDAR data. Several ground filter algorithms exist [22], however, for the case of UAV-LiDAR, it is well known that many established algorithms can fail in complex terrain situations with strong low vegetation influence [19], [23]. Unlike aircraft-based laser scanning data, many small on-ground features, such as low vegetation, are likely to be captured by UAV-based laser scanning due to its higher spatial resolution. Removal of low vegetation from UAV-derived point clouds is an active field of research [24], [25], however, most studies deal with the use of photogrammetric point clouds instead of using LiDAR.
In the context of remote sensing and cultural heritage, the identification of terrain anomalies is often based on a manual, visual image interpretation. Nevertheless, a few authors used (semi-) automatic mapping techniques (see Lambers et al. [26] for an overview). For instance, Due Trier and Pilø [27] and Toumazet et al. [21] followed template matching approaches that are based on estimating the similarity between a predefined model and the input image. Magnini et al. [28] used object-based image analysis (OBIA) techniques to detect shell craters from World War I from airborne LiDAR. Sevara et al. [29] compared semisupervised object-based and pixel-based classification approaches to identify archaeological features from ALS datasets. Some recent studies rely or are partially based on machine learning approaches for archaeological object detection [30], [31], [32]. More recently, deep learning methods, e.g., convolutional neural networks, have been introduced in the context of remote sensing [33]. These methods were successfully used, for example, to derive crop parameters from UAV data [34], [35]. Studies [36], [37], [38] use deep learning approaches for archaeological and historical feature detection.
In remote sensing, nonterrestrial laser scanners are primarily used on one of the following platforms, aircraft-mounted (airborne, ALS) or UAV-based. Many of the abovementioned studies in the context of automated mapping of historical terrain anomalies use ALS data for their investigations. In contrast, this study focuses on the use of UAV-based laser scanning. With regard to the use of UAV-LiDAR, however, the circumstances are completely different. The areas that can be covered by UAV-based systems are, of course, only a fraction as large as those that can be scanned by aircraft-mounted systems. Under good conditions, a LiDAR drone might be able to cover a few dozen hectares per day, which is only a fraction of the area that could be scanned by an ALS system in the same time (in the order of several square kilometers). This inevitably leads to the fact that there are only very few ground truth objects in a study area, and consequently only few training data available for the algorithms to be used. However, supervised classification algorithms rely on the creation of appropriate training samples, which is considered a key factor for the classification result. This also applies to many machine learning algorithms, especially deep learning, because they learn from the properties of the training samples. As outlined previously, these datasets are often not available in the context of historical / archaeological remains, especially not when using UAV data in small areas of only a few hectares. Other approaches, such as template matching-based procedures usually also rely on prior knowledge of the expected objects. Lambers et al. [26] emphasized that this can be problematic, especially for objects that are subject to transformation processes. Furthermore, OBIA-based methods often have the disadvantage that several parameters or thresholds are included in the detection and classification procedure (see e.g., [27], [28], [32]). Therefore, support vector machines (SVMs) as well as oneclass classifiers could be of interest. SVMs are generally considered a suitable machine learning approach when only limited training data are available [39]. This is also supported by the fact that only one class of interest-"terrain anomaly"-is involved in this classification problem. In these cases, a SVM can be defined as so-called one-class classifier: one-class support vector machines (OC-SVM), first introduced by Schölkopf et al. [40], were successfully applied to classify satellite data [41], [42], [43], [44], [45] for a variety of purposes.

C. Research Objectives
In this study, we propose a concept to map terrain anomalies in conflict landscapes using UAV-LiDAR data. The proposed approach combines ground filter algorithms with spline-based approximation of the terrain surface in order to reduce the effect of vegetation close to the ground. OC-SVM are used to map the terrain anomalies in the UAV-LiDAR data. In order to evaluate the potential of such a concept and its suitability for detecting historical terrain anomalies, we focus on the following research questions.
1) Can the results of ground filter algorithms be improved by spline-based approximation in terms of the recognizability of the terrain anomalies? 2) Is it possible to detect terrain anomalies by using a OC-SVM? 3) What impact do different vegetation densities and different filtering techniques have on the detection results? Therefore, the main contribution of this study is the development of a workflow for detecting terrain anomalies in a historical conflict landscape with UAV-LiDAR data. It includes the evaluation of different ground filters as well as a machine learning-based anomaly detection methodology.

A. Study Area
The research area is located in the Eifel region in western Germany: the Kall Valley (municipality Hürtgenwald, see Fig. 1). In November 1944, the US Army fought here against the Wehrmacht. As a result of several retreats in the steep terrain, which is difficult to access, the American soldiers had created a number of dugouts, known as "foxholes," to protect themselves from enemy fire [46]. Remains of these rather roundish dugouts can still be found today as terrain anomalies in the Kall Valley [4].
Specifically, the research area is located south-east of the village Vossenack and is mainly oriented in a north-south direction (see black box in Fig. 1). The area is partly covered by trees and is characterized by spatially differentiated low vegetation. In this valley section, we select two subareas as our study areas, each about 0.5 hectares in size. Study area #1 is characterized by dense near-ground vegetation (bushes and shrubs, which are present throughout the whole year) and deciduous forest. Study area #2 is an area in the Kall Valley where near-ground vegetation is less pronounced. Vegetation, such as ferns, dominate this area, but they are more prominent in summer than in winter. Since the data acquisition took place during winter season (see the following), the effect on the data acquisition is limited. Besides, individual trees are present in this area. Fig. 2 shows some sample photos of the two selected study sites. At the bottom of Fig. 2, three examples of terrain anomalies that were found in the Kall Valley during on-site prospection are shown. Table I summarizes the properties of the study areas, the flight, and scan parameters of the UAV-based data acquisitions and the achieved resolutions (point density and mean point spacing)   Table I): The system used for data acquisition is a RIEGL miniVUX-1 UAV laser scanner with a pulse repetition frequency (PRF) of 100 kHz mounted on a DJI Matrice 600 carrier drone. A flying altitude of 50 m above ground level (AGL) was chosen in order to keep the distance to the trees in the study area large enough and to ensure a permanent line of sight between the drone pilot and the UAV during the entire flight. In addition, the laser footprint is still small enough (less than 10 cm in diameter) when scanning from this altitude to allow detailed capturing of the Earth's surface [19]. Combining this with a flight speed of 3 m/s results in an average laser shot density of approx. 100 laser pulses per square meter. The actual point density achieved is higher due to the multiple return capability of the LiDAR-system. It is very likely that there are multiple return echoes for each laser pulse emitted, especially in areas with vegetation. On average, the point density ranges between 248 and 298 points per square meter in the resulting datasets, which corresponds to a mean horizontal point spacing of approx. 6 cm. The point density achieved is comparable to that of other studies involving UAV-based laser scanning in vegetated areas [47], [48].

B. Data Acquisition
Ground truth data, which in this case means the positions of known terrain anomalies, is acquired by visual interpretation of the data, supported by on-site prospection in the study area. Because the terrain in the Kall Valley is steep, predominantly densely vegetated, and therefore very difficult to access, an accurate total station survey of the shape of terrain anomalies cannot be undertaken. It is, therefore, limited to point-by-point approximate positioning of the existing anomalies using GNSS measurements.

III. METHODOLOGY
The methodology developed in this research consists of several steps. First, a preprocessing of the data is performed in the form of ground filtering. Three different filter algorithms are tested. Their results are further improved by computing a spline surface (see Section III-A). A differential morphological profile (DMP) is then created to increase the information content of the data and extract terrain features with specific shape and different size (see Section III-B). Subsequently, machine learning-based anomaly detection is implemented by applying a OC-SVM classification to the data in an unsupervised manner (see Section III-C). Finally, quantitative and qualitative validation is performed (see Section III-D). Fig. 3 depicts a flowchart illustrating the entire workflow of terrain anomaly detection.

A. Data Preprocessing
First, the generation of a DTM from LiDAR data requires a separation between ground and nonground data points, the so-called (ground) filtering. A variety of different filter algorithms exists and the choice of the filter algorithm is crucial in terms of the quality of the derived DTM [49]. Therefore, we test three different filter algorithms: The simple morphological filter (SMRF, Pingel et al. [50]), which is based on morphological operations, the cloth simulation filter (CSF, Zhang et al. [51]), which is based on simulating a rigid cloth on top of the inverted point cloud to identify ground points, and the adaptive triangulated irregular network filter (ATIN, Axelsson [52]), which uses a TIN-based approach to determine the ground surface.
However, the filtering algorithms may have problems filtering out low vegetation from the data [19], [22], [23], which is especially a problem when using high-resolution UAV-LiDAR data. Hence, the filtering procedure needs to be improved by removing LiDAR points that represent low vegetation that could not be successfully removed by the ground filter algorithms used. Therefore, we propose to approximate (smooth) the ground surface by using splines. We apply regularized splines with tension because they are characterized by the fact that they do not pass exactly through the input data. Instead, the terrain is approximated with a nonrigid surface of minimal curvature. For a more thorough mathematical discussion on the fundamentals of the smoothing problem, spline interpolation, and the application of regularized splines with tension to terrain surface approximation, the reader is referred to [53], [54], [55], [56], [57].
After calculating the spline surface, all LiDAR data points that are located above the generated spline surface are removed. They form the noise caused by low vegetation, whereas the details of interest that must be preserved are the downward terrain anomalies (this principle is illustrated in Fig. 4). The remaining ground points are then used to calculate the final DTM.

B. Multiscale Decomposition of the DTMs
Unlike, for example, multispectral or hyperspectral remote sensing data, which already consist of a large number of spectral bands, a DTM derived from the LiDAR height information is a single-band dataset. Therefore, in order to increase the information content of the data according to the objective of terrain anomaly detection, a multiscale decomposition by using a DMP is performed. The principle of DMP was proposed by Pesaresi and Benediktsson [58]. Although it was originally developed for segmentation and classification of high-resolution satellite images [58], [59], [60], its principle was also used for classifying ALS [61] and photogrammetric UAV point clouds [62].
A DMP consists of a sequence of operations from mathematical morphology. Opening (erosion followed by dilation) and closing (dilation followed by erosion) operations are used to highlight terrain structures of certain shape. In order to capture terrain anomalies of different extent, the size of the structuring element is increased step by step. The DMP is then formed by calculating the difference between two consecutive iterations. The components of the DMP are subsequently the input data for the terrain anomaly detection with machine learning.

C. Unsupervised Anomaly Detection Using Machine Learning
As outlined in the introduction, SVM are generally considered suitable for the cases where only limited training data are available [39]. This study aims on the detection of terrain anomalies, i.e., the separation of anomalous and nonanomalous pixels. As there is only one class of interest (i.e., the "terrain anomalies"), the use of one-class classifier is interesting. A general overview to one-class classifiers in the context of remote sensing is for example given in [63]. In this study, a OC-SVM is used. A brief description is given in the following.
OC-SVM for novelty detection was proposed by Schölkopf et al. [40]. Like normal SVMs, these are based on the computation of a separating hyperplane with the largest possible separating region in a higher dimensional feature space. However, OC-SVM tries to separate the data points x i seen during training from the origin, maximizing the distance to the origin. The underlying minimization function is with w and ρ being the weight vector and the offset of the separating hyperplane. Thereby (w · φ(x i )) ≥ ρ − ξ i applies, which means that each data point x i seen during training should be situated in the positive region, i.e., in the region separated from the origin by the hyperplane (mapped into a higher dimensional feature space by applying φ, see the following). The slack variables ξ i ≥ 0 and the regularization parameter ν ∈ (0, 1] control the number of misclassifications during training. N depicts the number of observations. Having calculated the separating hyperplane, the decision function for the classification stage then becomes with α i being the Lagrange multipliers and the function being the kernel, which uses the transformation function φ to map to a higher dimensional feature space. The function f returns positive for a sample that is similar to those seen during training (nonanomalous), negative for an anomalous data point [40]. A Gaussian radial basis function is used for the OC-SVM as this is a widely used kernel type and is common among other related studies [41], [42], [43], [63]. Classifiers, such as OC-SVM, are usually trained only with data from the positive class. OC-SVM, therefore, belongs to the group of so-called P-classifiers [63]. In this case, positive class means terrain without terrain anomaly. However, due to limited ground truth data, no labeled dataset is available. Consequently, when training the OC-SVM, it cannot be excluded that anomalous pixels (negative class) are also part of the training data. When using P-classifiers, it is usually assumed that the negative class is uniformly distributed [64]. Hence, an important parameter that has to be tuned when training a OC-SVM as introduced by Schölkopf et al. [40], [65] is the parameter nu. It is often depicted with the Greek letter ν, see (1), and acts as the regularization parameter [66]. It sets an upper limit of margin errors during training by allowing a certain percentage of the training data to be treated as anomalies (negative class). The other training samples are part of the positive region [42], [66]. Parameter values for the parameter ν between 1% and 5% should be suitable in case the positive and negative classes are well separable [43], [63]. Otherwise, this could lead to a high number of false positives (FP) or false negatives (FN) in the classification result. Since the distribution of terrain anomalies or their magnitude is not known in advance and an unsupervised approach is to be followed, different values for this parameter need to be tested for each application case.
Each dataset is divided into n patches. The training and classification procedure is run multiple times in order to be independent of the spatial distribution of training and test data. In each iteration, two thirds of the patches are used to train the OC-SVM. The respective model is then used to independently predict the remaining third of the data patches. This procedure is repeated until all possible combinations of patches have been used once for training. With a total number of n patches and two thirds used for training in each iteration, the total number of possible combinations is n 2n/3 , where each pixel is independently predicted 1 3 n 2n/3 times. The signed distance to the separating hyperplane averaged per pixel over all iterations is used for the final classification decision. If this distance is negative, the pixel in question is classified as an anomaly. Therefore, since several trained classifiers are combined, the chosen approach can be categorized as an ensemble classifier [67], [68]. Finally, a majority filter and a morphological closing operation are applied in order to remove individual noisy pixels.

D. Validation
The ground truth data, i.e., the known positions of terrain anomalies, are used for validation. Since the classification anomaly versus no anomaly is a binary one-class problem, it is common to use quantities, such as true positives (TP), false positives (FP), and false negatives (FN), for accuracy assessment. TPs are correctly detected anomalies, FPs are incorrectly detected anomalies, FNs are missed anomalies (present in the reference data but not in the classification output). Subsequently, the following accuracy measurements can be derived [69], [70]: Correctness or Precision = TP TP + FP (4) Each of these accuracy measures can take values between 0 and 1. By varying the OC-SVM parameter ν, several results are obtained per study area and filter (see experimental setup in Section IV-A). In order to test whether the results differ significantly or whether the differences are coincidental, a paired t-test is performed for each case. This statistical test may only be used if the differences are normally distributed. Therefore, a Shapiro-Wilk test of normality is applied to the residuals beforehand.

A. Experimental Setup
Three different filters (SMRF, CSF, and ATIN) are applied to the UAV-LiDAR data. Filtering with SMRF and CSF is implemented in point data abstraction library (PDAL), filtering with ATIN is performed using the software Agisoft Metashape, whose ground point classification tool is based on the ATIN algorithm [71], [72]. The parameterization of the ground filters is done as follows.
For the filter SMRF the maximum window radius parameter should typically be 10 m or greater, and the slope tolerance should not be lower than 0.1 [50]. In combination with this we set the cell size to 0.2 m, which results in a maximum elevation threshold of 1 m (0.1 × 0.2 × (10/0.2)). This roughly corresponds to the vertical extents of the foxholes, i.e., the terrain anomalies of interest in our study area. The final classification threshold is then set to 0.1 m similar to comparable study sites with steep/sloping terrain in other related studies [71], [73].
The parameterization for the CSF follows the specifications given in e.g., Zhang et al. [51] and Serifoglu Yilmaz et al. [71]. The maximum number of iterations is set to 500 and the rigidness parameter is set to 1 for steep terrain. We set the cloth resolution to 0.2 m and the classification threshold to 0.1 m similar to the SMRF cell size and threshold parameterization.
For the ATIN filter, Klápště et al. [72] proposed to set the initial grid size to at least 25 m. Specifically, this parameter is intended to resemble the size of the largest nonground object in the area, which corresponds in our study site to groups/clusters of trees, so this value should be suitable. The maximum angle parameter should be set to the mean slope of the area [71], which in our case is approx. 20 • . The maximum distance parameter is set to 1 similar to the maximum elevation threshold of SMRF, which resembles the possible elevation change in the terrain due to the terrain anomalies. This value also corresponds to the ATIN parameterization used in e.g., [71], [72].
Subsequently, in order to improve the filter results by removing misclassified low vegetation, we propose the use of splines (see Section III-A). For each filtered point cloud a spline surface is generated using GRASS GIS, and the points that are above the surface are removed. The procedure of data preprocessing is illustrated in Fig. 5 in a 25 m × 1 m transect taken from study area #1, which is visualized three dimensionally. The original point cloud data (a) contains a lot of dense vegetation. It is filtered (b), but there still remain a certain number of points representing low vegetation. A spline surface is calculated (c) and all data points above it are removed (d), which results in a smooth ground surface where the terrain anomalies stand out clearly. Then DTM are calculated by rasterizing the resulting point clouds [both without (b) and with (d) applying the spline-based processing in order to compare the effect achieved by the splines]. In each case, the minimum height value per raster cell is used, and the raster resolution is determined by the average horizontal point spacing. Empty cells are interpolated using inverse distance weighting. This rasterization procedure is performed with PDAL.
In the next step, the DMP (Section III-B) is calculated from each derived DTM with a circular shaped structuring element because the historical terrain anomalies of interest (foxholes) in the Kall Valley are round shaped. The radius of the DMP is increased in 1 m increments from 1 m to 5 m in order to extract both small and larger terrain anomalies. Since it is based on both opening and closing, this creates a total of ten bands. The DMP is calculated using SAGA GIS.
Subsequently, the normalized bands are the input for the OC-SVM classification. For this purpose, each study area is divided into 12 patches. In this way, each patch is about 10 000 pixels in size. As outlined in Section III-C, the training and classification procedure is run multiple times, each time using two thirds of the patches for training and one third being independently predicted. Having 12 patches and using two thirds of the data for training in each iteration, the total number of possible combinations is 495, where each pixel is independently predicted 165 times. For the regularization parameter of the SVM, values between 1% and 5% are tested in 0.005 increments. Since we follow an unsupervised ensemble-based approach here, the Kernel parameter γ is not additionally varied. Instead, it is kept at its default setting, which is the reciprocal of the number of features. The OC-SVM classification is implemented in a Python script, using the OC-SVM package from the Scikit-learn library [74].
For validation, the classification result is transferred into a vector dataset. In this way, adjacent pixels marked as anomalous (−1) together form a polygon representing a terrain anomaly in the result dataset. The positions of known terrain anomalies are available as point geoobjects (see Section II). To account for the possible inaccuracy of GNSS positioning during onsite prospection, an anomaly detected by the classifier is still classified as TP if it intersects a circle with one meter radius around a known reference position in the ground truth dataset. In case a terrain anomaly in the classification result has spatially more than one equivalent in the reference dataset, it is counted once as TP and the remaining times as FN. Likewise, in the case that an anomaly in the reference dataset has more than one correspondence in the classification result, it is counted once as TP and the remaining times as FP. Finally, the accuracy measurements completeness, correctness, and the F1-score are calculated.

B. Terrain Anomaly Detection
The validation results are shown in Fig. 6. Subfigure (a)-(c) refer to study area #1 with dense near-ground vegetation together with individual tree cover, subfigures (d)-(f) refer to study area #2 with sparse near-ground vegetation and hardly any tree cover. In each subfigure in Fig. 6, the results of the classification without prior smoothing of the respective ground points (light orange box plots) are compared in pairs with the results of the classification with prior application of the splinebased approach (dark green box plot). Each box plot shows the range of the completeness, correctness and the F1-score as the SVM parameter ν runs from 1% to 5% in 0.005 increments for the respective dataset. Thus, nine paired values are included in each of the plots.
In some of the plots, no whiskers (indicating the range from the minimum value to the first quartile Q1 and the range from the third quartile Q3 to the maximum value) and/or no horizontal line (indicating the median value) are visible. This is due to the fact that the validation is done on an object-by-object basis Fig. 6. Results of the OC-SVM terrain anomaly detection. Each box plot shows the distribution of completeness, correctness, and the F1-score as the SVM parameter ν iterates from 1% to 5%. The light orange plots depict the results without spline-based processing of the filter results, the dark green plots depict the results with spline-based processing of the filter results. For those validation quantities marked with *), the differences between the two means are statistically significant (α = 0.05). rather than pixelwise. For example, the validation dataset for study area #1 contains ten anomalies. This means that, e.g., the completeness can only take values in 0.1 steps in its value range. In this way, for instance, it can occur that the minimum value is equal to Q1 or the maximum value can be equal to Q3 (so that one or both whiskers disappear) and/or the median is equal to Q1 or Q3 (so that the horizontal line is not visible).
When comparing the results of the two study areas, it is noticeable that the detection of the terrain anomalies in the study area with less pronounced vegetation (area #2) has generally higher accuracy values than in the area with dense vegetation (area #1). It can be observed that in both study areas, a value of 1.0 could be reached for the completeness, the maximum value for the correctness, however, is 0.55 in area #1, and 0.83 in area #2. The F1-score reaches a maximum value of 0.57 in area #1, and a maximum value of 0.81 in area #2.
The detailed results for the two study areas underline the effect of the different filters and splines on the final terrain anomaly detection. In area #1, there is dense vegetation close to the ground and individual tree cover. The SMRF algorithm Fig. 6(a) already achieves an F1-score of 0.46 and a completeness of 0.8 without the spline-based smoothing of the terrain points. The results are improved in terms of the quality measurements by using the splines. The completeness reaches values of 1.0. In contrast to this, the sole use of the two other filters CSF and ATIN, Fig. 6(b) and (c) is limited in terms of the quality measurements, e.g., resulting in F1-scores of 0.08 (CSF) and 0.16 (ATIN). However, the results are improved by the additional use of the splines. For  TABLE II  RESULTS FOR THE OC-SVM PARAMETERIZATION LEADING TO THE HIGHEST F1-SCORES example, a completeness of 0.7 (CSF) or 0.8 (ATIN) is achieved, the F1-score ranges from 0.29 to 0.5 (CSF) and from 0.25 to 0.57 (ATIN). The differences with respect to the completeness when using SMRF or CSF Fig. 6(a) and (b) are remarkable, however, their residuals are not normally distributed. Therefore, the significance test could not be applied here. Except for the correctness when using SMRF, the differences between the two mean values in all other cases are statistically significant (α = 0.05) for area #1.
In study area #2, which is characterized by hardly any tree cover and only sparse near-ground vegetation, the filter methods SMRF and CSF Fig. 6(d) and (e) already achieve completeness values of more than 0.7 without the spline-based processing, and even more than 0.8 when ATIN is used Fig. 6(f) ATIN). After applying the spline-based processing step to the ground points, improvements are observed with respect to the correctness and the F1-score for the algorithms CSF (correctness 0.65, F1-score 0.65) and ATIN (correctness 0.72, F1-score 0.74). The additional use of the splines also improves the performance of the SMRF filter in terms of the F1-score (0.81) and the completeness (1.0). The improvements achieved by applying the spline approximation are statistically significant for the completeness when using SMRF, for the correctness when using CSF or ATIN and for the F1-score, independent of the filter algorithm used. Table II summarizes the results for the OC-SVM parameterizations that lead to the highest F1-scores in our study areas. As an example, Fig. 7 shows maps with the classification results using CSF as the filter method. On the left-hand side on each figure, hillshade visualizations of the DTMs are shown to illustrate the visual differences between the filter results without and with applying the spline-based processing step. On the right-hand side, the OC-SVM anomaly detection results for each DTM are presented. Pixels colored in black have a value of −1 and thus indicate an anomaly. The locations of known terrain anomalies from the ground truth datasets are marked by red circles. The terrain anomalies in study area #1 are mainly lined up in a row from north-west to south-east. In study area #2, they are spatially more evenly distributed.
The smoothing effect achieved by applying the spline-based processing step is visible by looking at the hillshade visualizations of the filter results without and with the splines. Besides, elongated objects are visible in both study areas, especially in area #2. They were not eliminated by the filter algorithm, but obviously removed during the spline-based processing step.
In the right-hand side column [see Fig. 7(b), (d), (f), and (h)], the OC-SVM anomaly detection results are depicted. Each anomaly (spatially contiguous black pixels) that intersects one of the red circles of the ground truth dataset represents a correct classification, i.e., a TP. An anomaly in the classification result that has no spatial correspondence to an object of the ground truth dataset is counted as FP. Likewise, if an object from the ground truth dataset has no equivalent in the classification result, it is counted as FN. The visual interpretation of the results in study area #1 underlines the (significant) increase in both completeness and correctness, which was already confirmed by the quantitative validation in Fig. 6 and Table II. When comparing the results with the highest F1-score, the values achieved when using CSF increase from 0.2 (without splines) to 0.5 (with splines) for the completeness and from 0.05 to 0.5 for the correctness. This is also confirmed by a visual interpretation of the results shown in Fig. 7(b) and (d). The total number of misclassified anomalies is reduced (FP), which increases the correctness [see (4)], however, at the same time a few more terrain anomalies were correctly labeled as anomaly (more TP, fewer FN), which leads to higher values for the completeness [see (3)].
Contrary to study area #1, the completeness does not improve in study area #2 when using CSF Fig. 6 and Table II, however, the correctness increases from 0.42 to 0.65 when comparing the results with the highest F1-score, respectively. Hence, Fig. 7(f) and (h) shows how the number of falsely detected objects decreases in the classification result.

V. DISCUSSION
In this study, the problem of mapping historical terrain anomalies with UAV-LiDAR data was addressed. We proposed a workflow that is based on the extension of commonly used filters by splines as well as OC-SVM to map the terrain anomalies. Three research questions were stated at the beginning relating to i) the spline-based approach, ii) the general performance of the OC-SVM, and iii) the impact of the low vegetation.
The qualitative improvements achieved by using the spline approximation are already confirmed by a visual interpretation of the hillshade visualizations of the DTMs. Fig. 7 illustrates this using the CSF filter as an example. In study area #1 Fig. 7(a) and (c), the noise caused by the dense near-ground vegetation could be drastically reduced by applying the additional splinebased processing. The terrain anomalies would not be visually identifiable without improving the filter result. In study area #2, elongated objects were still present in the DTMs after the initial filtering [ Fig. 7(e)]. These are attributable to deadwood, because deforestation processes took place in the Kall Valley in the past years [4]. Other studies confirm that laser scanning is a widely used remote sensing method to detect coarse woody debris [75], [76], however, improper filtering of the LiDAR data can be problematic when trying to separate ground from deadwood [77]. After applying the splines to the filter result [ Fig. 7(g)], the deadwood structures are removed from the data.
Furthermore, it can be quantitatively stated that the F1-score is always higher when using splines than without using splines (it is up to 42% points higher in area #1 and up to 14% points higher in area #2, see Table II). The spline-based processing step thus provides an additional benefit in this regard and apparently works well in this type of terrain. This confirms the research presented by Mongus and Žalik [78]. The authors also consider splines to be a suitable method for surface interpolation, especially in order to filter areas with steep slopes and vegetation. However, they used ALS data with lower resolution (approx. 0.5-3 points per square meter) on a different scale. Overall, it can be assessed that the extension of common ground filters by splines highlights the terrain anomalies in the DTM. Including the spline-based approximation of the terrain surface into the filtering process proofs useful in order to remove low vegetation and thus better identify terrain anomalies.
Moreover, we have investigated the general performance of a OC-SVM to automatically detect terrain anomalies. We developed an OC-SVM methodology in an unsupervised manner. The input features were generated by calculating a DMP from the DTMs. By applying this procedure to the different DTMs, terrain anomaly detection was performed and validation quantities of the classifications were calculated. Similar to e.g., [43], [63], we tested values between 1% and 5% for the OC-SVM parameter ν, which acts as the regularization parameter by allowing a soft margin to a certain extent. Also in our two study sites, the choice of this parameter in this range of values leads to acceptable (study area #1) or good (study area #2) results. Setting the parameter even higher would mean allowing more misclassifications, i.e., more anomalous pixels, during the training phase. This would lead to more terrain anomalies being detected as such in the classification result afterward (increasing of the completeness), but would generally also result in more FP (decreasing of the correctness). Conversely, setting the parameter ν even lower would result in a less permeable margin, making the model less general. Thus, fewer pixels would be reported as anomalous in the classification stage, decreasing the completeness but potentially increasing the correctness. In both cases, however, there would be no substantial improvement in terms of the F1-score, but this accuracy measurement should always be considered as well. Assuming the entire image is classified as anomalous, this would result in a completeness of 1.0, but of course such a result would not be usable at all. Hence, increasing or decreasing the parameter ν further would not produce more useful results for this application.
Although the computation time of the proposed workflow was not focus of this study, the runtime can be easily reduced: Since the final classification decision is based on several SVM instances, which all work independently and are only brought together at the end (ensemble classifier), the method can easily be parallelized.
In our final research questions we investigated the impact of different densities of near-ground vegetation on the detection results and the performance of different filter techniques. In general, the removal of near-ground vegetation LiDAR data points positively affects the classification results. In both study areas, the F1-score increases significantly.
In study area #1, where dense near-ground vegetation is present, the validation measurements increase significantly when using CSF or ATIN together with the spline-based approach. This does not apply to SMRF in study area #1. This filter algorithm apparently generates a better filter result in this situation than the other two. SMRF is the only algorithm that reaches a completeness value of 1.0 after applying the splines (the completeness is 0.8 when taking into account the highest F1-score). From this it can be concluded that SMRF is more suitable than the CSF method or the TIN-based ATIN filter when a lot of dense low vegetation is present in an area. This is confirmed to some extent by the research presented in [19], [72], who also consider the morphological-based ground filter SMRF more suitable for this type of terrain situation (terrain with slopes and dense low vegetation) than the other filters. Nevertheless, it can be stated that by applying the spline-based approach in order to remove low vegetation, we were able to improve the other two filter results in such a way that they become more comparable in terms of the F1-score.
In study area #2, where ground-level vegetation is much less abundant than in area #1, the classifications of the different filter results are more similar to each other from the outset. Therefore, the negative impact of the vegetation on the terrain anomaly detection is lower when compared to study area #1, and the filters CSF and ATIN generate more adequate results. However, deadwood was still present in the initial filter results (see the previous). Its removal apparently leads to less misclassifications of the OC-SVM, thus increasing especially the correctness. The completeness was not significantly increased when using CSF or ATIN. On the contrary, the opposite is true for SMRF: The completeness could be significantly increased up to a value of 1.0. Applying the splines to the results of SMRF results in more terrain anomalies being detected as such, which confirms the abovementioned statement.
There are deciduous trees present in study area #1, whereas there are hardly any trees in study area #2. However, it is unlikely that the tree population influenced the results of this study. Laser scanner beams penetrate the vegetation up to a certain extent [7], and since the data acquisition took place in winter during leafoff season, the presence of defoliated deciduous trees plays a subordinate role for the scanning of terrain anomalies [19]. Deforestation has only been occurring more frequently in recent years [4], so that longer-term processes, such as soil erosion can be ruled out. Therefore, the differences described previously are attributed to the low vegetation.
In summary, with respect to this research question, groundlevel vegetation and its removal have an impact on the detectability of historical terrain anomalies. This contrasts to some extent with the results of Bollandsås et al. [79]. The authors stated no significant effect on the detection rate of historical remains when smoothing the DTM, however, they use ALS data instead of UAV-based LiDAR data. On the one hand, their ALS data has point densities that range between 1 − 10 points per square meter, which is 10 − 100 times lower than the point density of the UAV-based LiDAR data that we acquired in the Kall Valley. On the other hand, the laser footprint diameter on the surface-which is also a measurement of how detailed the surface and objects can be captured [20]-is much higher for airplane-based laser systems than for UAV-based systems (in the order of decimeters versus centimeters [19]). For these two reasons, a drone-based laser scanning system accordingly captures more detailed, smaller objects from the outset, which then correspondingly also include low objects (i.e., deadwood) or vegetation near the ground. The challenges of the filtering procedure to separate terrain from nonterrain points are therefore different from those of airplane-based systems, which explains the differences to the study mentioned previously. Of course, in general, a drone-based system can detect smaller historical terrain anomalies than an aircraft depending on the different spatial resolution, therefore the scale for the application of the systems should be considered.
The detailed validation underlines the general suitability of the proposed workflow for detecting historical terrain anomalies with UAV-LiDAR data. It is noticeable that the achieved values for the completeness are generally higher than the correctness, i.e., no or only a few terrain anomalies were missed by the classification (FN/error of omission that influences the completeness), whereas the approach tends to misclassify nonanomaly objects as anomalies (FP/error of commission that influences the correctness). In order to serve as an effective support for subsequent fieldwork and reduce the time spent on the field, it is more important to detect as many anomalies in an area as possible than to produce as few FP as possible. An experienced historian or geographer could easily verify if an object from the classification dataset is indeed a terrain anomaly, or if it turns out to be a FP. All objects included in the classification result would only have to be examined once in the field. On the contrary, errors of omission would be much more difficult to detect when being on-site. Theoretically, the entire area would always have to be examined for FN, which would mean that the classification result would no longer provide any added value. A similar estimate of the impact of the different error types on the suitability for archaeological/historical purposes can be found in e.g., [21], [27], [29]. Therefore, we can state that our approach is applicable to support on-site prospection.
Further research could mainly address the following three aspects. First, the applicability of the methodology for terrain anomalies with other shapes could be investigated. On the one hand, this can refer to the shape of the objects from a bird's eye view (e.g., rectangular instead of round). By adjusting the DMP by using a rectangular structuring element instead of a circular one, it will be possible to extract such shapes from the terrain models as well. On the other hand, this can also refer to convex rather than concave anomalies. Second, the use of additional information, such as the (usually monochromatic, i.e., single-band) LiDAR return intensity could be tested. It is partly related to the moisture content of the surface [80] and could therefore help to distinguish between terrain anomalies and their surroundings. Third, the transferability with respect to other UAV data could be investigated, for example photogrammetric point clouds generated with structure from motion algorithms from optical RGB drone imagery. These approaches could also capture the geometry of historical terrain anomalies (depending on the elevation differences), but it is to be expected that these will be able to capture little to no ground information, especially in densely vegetated areas. However, such systems could still potentially be a more cost-effective alternative to UAV-LiDAR.
The research presented in this study provides a benefit for mapping and documenting historical terrain anomalies. This is particularly important because these landscapes are subject to (mostly anthropogenic) transformation processes caused by climate change, deforestation or re-enactors [4]. This also underlines the importance of field surveys that can be conducted following a remote sensing campaign.

VI. CONCLUSION
In this study, we proposed a workflow to detect terrain anomalies in a historical conflict landscape using UAV-LiDAR data. The results showed that the morphological ground filter was the most suitable when dense near-ground vegetation is present. By adding the spline-based processing step into the filtering procedure, the other two filters used were improved in such a way that their results become more comparable to each other in terms of the F1-score. Generally, the completeness (recall) was higher than the correctness (precision). However, in the research field of historical terrain anomaly detection, it is more important to detect as many anomalies in an area as possible than to produce as few FP as possible. Therefore, our approach can support on-site prospection.
Marcel Storch received the M.Sc. degree in geoinformatics and remote sensing from Osnabrück University, Osnabrück, Germany, in 2018. He is currently working toward the Ph.D. degree in geoinformatics.
Since 2018, he has been a Research Associate with the Institute of Computer Science, Osnabrück University. His research focuses on the use of laser scanning technology for conflict landscape research, particularly scanning and detecting historical terrain anomalies using UAV-based LiDAR.
Norbert de Lange studied geography and mathematics with the Universities of Bochum and Münster. He received the doctoral degree from Münster University, Münster, Germany, where he taught and did research with the Institute for Geography (among others with a focus on quantitative methodology in geography).
Following his habilitation in 1987, he became a Lecturer, in 1993 he received a professorship from Münster University for regional information systems and computer cartography, and in 1994, a professorship from Osnabrück University, Osnabrück, Germany. In 2005, he was involved in founding the Institute for Geoinformatics and Remote Sensing, Osnabrück University, which has since merged with the Institute of Computer Science, where he worked from 2016-2021 as a Professor for environmental computer science and planning. Most recently, he dealt with topics, such as environmental monitoring, augmented reality, and citizen science.
Thomas Jarmer received the diploma in physical geography with a major in remote sensing and soil science and the Ph.D. degree in geography from Trier University, Trier, Germany, in 2003. In