Automatic Weighted Splines Filter (AWSF): A New Algorithm for Extracting Terrain Measurements From Raw LiDAR Point Clouds

Airborne raw light detection and ranging (LiDAR) measurements are georeferenced three-dimensional coordinates of ground surface, including all natural and man-made features. Extracting terrain surface measurements from raw LiDAR data is referred to as “filtering.” Many filtering algorithms have been published, indicating the difficulty of the task; however, none performs equally well on all kinds of landscapes. This article presents a new algorithm, automatic weighted splines filter (AWSF), to extract the terrain points from raw LiDAR measurements. The mathematical model of the AWSF algorithm utilizes both the cubic smoothing splines (to interpolate and fit raw LiDAR data) and z-shaped function (to estimate the weight value of each point). The AWSF algorithm performance is compared against 14 filtering algorithms published between 1998 and 2019, as well as one unpublished (proprietary) algorithm designed by the world-leading company in processing remote sensing data, Harris Geospatial Solutions. Diverse landscape scenarios are used, ranging from open fields, rural land, and urban areas to dense forests on mountains where most of the filtering algorithms struggle as these areas represent the most difficult and challenging landscapes because the canopy prevents LiDAR pulses from reaching the ground surface. A total of 19 samples were tested; the results clearly show that the filtered terrain measurements are accurate and that the performance of the AWSF algorithm is stable for all the LiDAR data samples in comparison with the other filtering approaches.


I. INTRODUCTION
H IGH-RESOLUTION digital terrain models (DTMs) can be generated from raw airborne light detection and ranging (LiDAR) data [1]. LiDAR technology is an active remote sensing system that calculates the distance between the laser scanner and the ground surface by measuring the two-way travel time of the emitted laser pulses [2], [3]. Raw LiDAR point clouds are very dense, irregularly distributed, and geo-referenced threedimensional (3-D) measurements [4], [5]. LiDAR points could represent any natural or man-made feature on the ground surface, such as vehicles, trees, buildings, roads, power lines, etc. [6]. Removing all the off-terrain measurements (nonground points) to extract only the terrain measurements (ground points) from raw LiDAR data is referred to as "filtering" [7]- [9]. However, filtering LiDAR data in dense forests are still a challenge as the canopy prevents the emitted pulses from reaching the ground surface. Therefore, specialized approaches have been developed specifically for this purpose such as [10]- [14]. Several approaches for filtering raw LiDAR point clouds have been presented in the last two decades [15]. They can be categorized into three main groups: morphologybased, segmentation-based, and interpolation-based methods.
The first category uses the morphological processes such as erosion, dilation, and opening to filter raw LiDAR data [16]- [19]. The concept of these approaches is simple; it is based on comparing the elevation values within specific predefined horizontal distance (window) [20]. The morphological opening process can remove the measurements, which have higher elevations but retain the lower elevation measurements. This process is repeated using different horizontal distance values that should start from a small value and increase gradually through the iterative steps [21], [22]. Liu and Lim [23] presented a voxel-based multiscale morphological filtering method that applies a convexity constraint to LiDAR measurements. Then, the morphological operations are applied with a moving window to classify the data into terrain and off-terrain points after applying Otsu segmentation with a threshold value as a last step.
The segmentation-based group of filters deals with LiDAR point clouds as a raster image (interpolated LiDAR measurements). The whole raster LiDAR data are segmented and processed after identifying the segmentation parameters such as scale and weight for each layer [20]. These methods try to identify the segments of measurements raised higher than the surrounding segments, and then classifying the high segments as off-terrain measurements and the low ones as terrain measurements [24]. To improve the quality of the resulting data, additional parameters such as curvature and height difference, low and high vegetation, and smooth and planar surfaces can be used [25]. However, the segmentation-based methods are efficient for urban areas, but they struggle and fail in rural and forested areas due to the difficulty in identifying a distinct segment of measurements [26]. This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ The third category is based on interpolating the ground surface by using the whole LiDAR dataset [6]. Usually, the interpolation-based methods work on predicting the terrain surface by using a linear function to interpolate the data and a weight value could be assigned to the measurements [27]. Brovelli et al. [28] used bicubic and bilinear functions to interpolate the data, the residuals' histograms are computed as a step for estimating the threshold value. In addition, the gradient magnitude of the surface is calculated for improving the performance of the method. Kraus and Pfeifer [11] utilized the linear prediction for interpolating the randomly distributed LiDAR data using some ground control points. Then, the weights of the data points are calculated to enhance the classification of the terrain and nonterrain measurements. Chen et al. [29] presented a method that starts by identifying the last-return LiDAR measurements. Then, the prediction of the terrain surface is done iteratively.
Some filters are designed to process only the forested areas [3], [29], [30], and others to filter the urban landscapes containing large buildings [2], [31]. Silva et al. [32] compared the performance of some airborne LiDAR data filtering algorithms in forested areas. The algorithms are weighted linear least-squares, multiscale curvature classification, progressive morphological filter, and progressive triangulated irregular network. Nevertheless, there is a group of filtering approaches that does not belong to any of the abovementioned categories, for example, the triangulated irregular network-based methods [33]- [35] and the statistical-based methods [36]- [38]. Hui et al. [39] suggested that a filtering process starts by removing the outlier points, then estimating the ground surface by fitting the data using a quadratic polynomial function. After that, the method assumes that LiDAR data represent a mixture of Gaussian models, which can be used for refining the classification of the data into terrain and off-terrain measurements. Although the algorithm endeavored to automate the process of extracting the ground points, its performance was moderate in comparison with the other published work. Cai et al. [40] filtered the data through three main stages. In the first stage, the algorithm generates an initial ground surface estimation by applying a cloth simulation for identifying some ground control points. Then, the parameters' values are estimated using statistical analysis. Finally, the progressive TIN densification process is applied for extracting the final terrain measurements. Cai et al. [41] presented a model transfer-based filtering methodology that combines transfer learning theory with active learning for identifying the terrain points. It requires a training dataset as a core for this method. Bayram et al. [42] exploited weighted graph representations for the analysis of LiDAR data. This method considered the data as irregularly distributed signals and applied an iterative graph signal filter to remove the off-terrain measurements from the raw data.
A new algorithm for filtering raw LiDAR data is presented in this article. The proposed automatic weighted splines filter (AWSF) belongs to the interpolation-based filters. Unlike the other interpolation-based methods, which mostly use linear prediction or bicubic functions to interpolate the data, the AWSF has exploited, for the first time, the cubic smoothing splines function, which is very effective for such noisy data, as the smoothing parameter controls the fitted curve and leads to reserving the terrain measurements efficiently. Also, the z-shaped function is used to vigorously assign a weight value for each measurement. More significantly, the AWSF is designed to process raw LiDAR data with different densities and diverse landscapes that range from open fields, roads, railroads, power lines, buildings, rivers, vegetated areas to heavily forested areas. The detailed methodology is described in part 2 of this article, whereas parts 3 and 4 present the results and conclusions of the research.

II. AUTOMATIC WEIGHTED SPLINES FILTER (AWSF)
The proposed filter is based on processing raw LiDAR data by using cubic smoothing splines and z-shaped functions. A mathematical background about the smoothing spline interpolation is presented in the following section. After that, the AWSF processing stages are explained.

A. Cubic Smoothing Splines Interpolation
A cubic spline interpolation function f (x) is a series of different cubic polynomials on each interval [ξ i , ξ i+1 ], such that the derivatives of f (x) are continuous up to the second order. Let the coordinates of the ith measurement be (x i , y i , z i ), i = 1, 2, . . . , n, where n is the number of the interpolated measurements. The form of a cubic spline f (x) on each interval is [43] z where a i , b i , c i , and d i are the coefficients that can be determined from the definition of cubic splines. Continuity and interpolation require that . Also, at each interval node of a cubic spline, the first and second derivatives should have the same values; ). In addition to the abovementioned equation, we can benefit from the properties of the "natural spline" function by considering the second derivatives at the end nodes equal to zero; f 1 (x 1 ) = f n−1 (x n−1 ) = 0. All the previous conditions and properties will help to determine the values of the coefficients a i , b i , c i , and d i . After some manipulation, we get [43], [44] Once the coefficients c i are determined, from (2), the remaining coefficients could be found simply as follows [43], [44]: There are stable mathematical methods for determining the coefficients a i , b i , c i , and d i and solving such systems. For more details, see [45]- [47]. For some applications, it might be required not to interpolate the data measurements exactly, but to create kind of smooth spline, which aims to pass close to the measurements but not necessarily through them. Cubic smoothing spline functions represent the solution, by minimizing [47] where w i is the weight for the ith measurement, and α is the smoothing parameter. Criterion (4) consists of two terms, the first one measures how closely the function fits the data points, and the second one measures the smoothness of the fitted function. The smoothing parameter value ranges between 0 and 1 [47]. When α = 0, the resultant fit would be a straight line due to integrating the second derivative of a cubic function, whereas choosing α = 1 leads to a cubic spline interpolation, as illustrated in Fig. 1, because the second part of (4), which smooths the resultant curve, is eliminated. Minimization of (4) could produce a smooth fitted curve that follows the general trend of the measurements but does not interpolate them exactly. The AWSF algorithm exploits this significant property of the cubic smoothing splines function to help in filtering raw LiDAR data.

B. AWSF Algorithm
The AWSF algorithm introduces an automatic filter for identifying and removing the off-terrain measurements from raw Li-DAR data. MATLAB codes have been developed to implement all the processing stages described in the following sections.
1) Raw LiDAR Data Gridding: Raw LiDAR point clouds can be described as a random set of measurements covering the scanned area. Therefore, the first step of the filtering process would be converting these randomly distributed data into a uniform grid [18]. To generate the grid, a transformation between the irregular raw LiDAR measurements and the regularly spaced grid is applied. The dimensions of the grid-m rows and l columns-can be determined as follows [18]: where G s is the grid cell size (spatial resolution). The selection of the G s value depends on LiDAR data density. The denser the raw data, the smaller the grid cell size. The value assigned to each grid cell is the coordinates of the LiDAR measurement that falls into that cell. However, because of the irregular distribution of the measurements, some grid cells have no LiDAR measurements, while other cells have more than one. In the latter case, when there is more than one LiDAR measurement, the AWSF algorithm assigns the grid cell the measurement with the lowest elevation (z − coordinate) as a first attempt to establish the terrain surface. Consequently, the created 2-D array A{m, l } contains either empty cells or the coordinates (x, y, and z) of the LiDAR points.

2) Data Normalization, Fitting, and Residual Computation:
The aim of the normalization is to improve the accuracy of the numerical measurements before creating the fitted curve. To normalize a set of data, the measurements should be centered at zero mean and scaled to the standard deviation value of the measurements as follows [48]: where Z = z 1 , z 2 , . . . , z n , μ z is the mean value of Z, and σ z is the standard deviation of Z. For each row of the LiDAR dataset A, (6) is applied to normalize the data. After that, Criterion (4) is applied to fit (X, Z) data using a cubic smoothing spline function, where Z = z 1 , z 2 , . . . , z n . At this stage of processing, it is important to determine the smoothing parameter value α, which varies from 0 to 1. For the purpose of LiDAR data filtering, it is desired that the fitted curve passes close to the data measurements; α ≈ 1. Many tests were conducted to find the optimal value of the smoothing parameter, it is found that the α value should range from 0.9999 (for forested areas) to 0.99 (for all other areas). The fitted curve follows the general trend of the terrain and off-terrain LiDAR measurements without necessarily passing through each point. The difference between the elevation of each LiDAR measurement z i (l) and the elevation of the fitted curve z i (f ) is called the residual value υ i . The residuals for all LiDAR measurements are computed using (7); these values will then be used to assign a weight to each LiDAR measurement [11] as Fig. 2 shows a sample LiDAR data, the fitted curve, and the residual values. Obviously, LiDAR measurements with negative residual values lie below the fitted curve, and hence, more likely to be terrain measurements, whereas, on the other hand, points with positive residual values lie above the fitted curve, and therefore, more likely to be off-terrain measurements.

3) Computing the Weights by Using z-Shaped Function:
In the first iteration, the curve is fitted with equal weights for all the z i values. A z-shaped function is used to estimate the weights w i as follows [49]: where σ and τ are the shift and threshold values, respectively. σ is the negative value of the standard deviation of the residuals υ i . The measurements that have a residual value less than σ will be assigned a large weight value of 1 as they are more likely to be classified as terrain measurements.

4) Removing Off-Terrain Measurements:
The threshold value τ is used for determining the candidate measurements to be removed. All the LiDAR measurements with residual values υ i more than τ would be considered as off-terrain measurements and removed. The initial threshold value τ o should be a small value so that the algorithm can remove as much as possible of the off-terrain measurements. For forested areas τ o = 0.25 and for all other areas τ o = 0.50. The weights w i , which are computed by applying the z-shaped function described in (8), will then be used in a second iteration. LiDAR measurements having large negative residual values will be allocated large weight values, and consequently they attract the fitted curve toward them, whereas measurements that have positive or small negative residuals will be given smaller weight values to minimize their effect on the fitted curve. Fig. 3 illustrates that if the measurement has a residual value less than σ, then its weight will be 1 and, therefore, has more influence on the computations of the curve fitting. On the other hand, if the measurement has a residual value more than τ , then its weight will be 0, and therefore, it will be removed. The weights increase gradually from 0 to 1 when the residual values decrease from τ to σ. The previous processing steps are repeated for each row of the raw LiDAR dataset A. 5) Removing Blunders: Generally, there are two kinds of blunders in LiDAR data, positive blunders and negative blunders. Commonly, positive blunders emerge from laser returns from birds, aircraft, etc., and negative blunders come from pulses that are reflected several times or malfunction of a laser rangefinder [50]. Positive blunders can be removed by completing the first iteration over all the rows of A. However, negative blunders still exist. Nie et al. [51] suggested a three-step method to remove the negative blunders. In the first and second steps, the method tries to automatically remove the blunders; after that it requires manual intervention to remove the rest of the measurements, which cannot be removed automatically. The AWSF introduces an automatic approach to overcome the issue of manual editing by classifying the measurement, which has a residual value υ i less than −3σ r (σ r is the standard deviation of the residuals) as a negative blunder. In other words, if a point lies below the fitted curve by more than 3σ r , it will be classified as a negative blunder and removed from the data, as illustrated in Algorithm 1.

6) Iterations and Terrain Measurements Extraction:
The weight w i of each retained measurement is used in the subsequent iterations to compute a new fitted curve. To improve the results, the columns of the LiDAR dataset A could be filtered by applying the previous steps. After filtering all the rows and columns for the first iteration, τ values should be decreased gradually from large to small in the following iterations. It is preferable to start the second iteration using large τ value to stay in the safe range and not to remove any terrain measurement. After implementing many tests on raw LiDAR data with different densities covering various kinds of areas, it was clear that starting the second iteration with a threshold value of 7 (or greater), is sufficient. Therefore, the following values are used for all the study areas τ = [7, 6, 5, 4, 3, 2, 1]. Finally, as illustrated in Algorithm 2, all the previous processing stages should be applied for each value of τ . Fig. 4 illustrates the iterating process to remove the off-terrain LiDAR measurements and extract the terrain points. The preserved measurements are gathered in one file and classified as terrain measurements. Algorithm 2 illustrates the main stages of raw LiDAR data filtering using the AWSF algorithm. Basically, after loading the raw LiDAR data, it is required to provide the grid cell size and the type of the scanned area (forest/other).
Therefore, there is no need to define the smoothing parameter and the threshold values as they are predefined and fixed along all the processing stages. However, the algorithm allows the user to test a sample data of the area of interest and adjust the parameter values easily to enhance the results.

III. EXPERIMENTS AND RESULTS
The performance of the AWSF algorithm is tested using two different LiDAR datasets. The first one is the data provided by ISPRS commission 3/WG3, which includes 15 subsamples [8].  [55] The error values of the filtered data, using the AWSF algorithm, were computed and compared against corresponding values (the same 15 subsamples) from other algorithms, which have been published between 1998 and 2019. The second dataset consists of four samples in heavily forested areas that represent a challenge for most of the filtering algorithms. DTMs were generated from the filtered data and compared pixel-by-pixel against very accurate reference DTMs. Both of the raw LiDAR data and reference DTMs were obtained through EDINA digimap LiDAR data service [52]. Additionally, the performance of the AWSF was compared against professional software designed by the world-leading company in processing remote sensing data, Harris Geospatial Solutions.

A. ISPRS Dataset
Seven samples of raw LiDAR data were collected to test the filtering algorithms presented by researchers from all over the world. These samples were divided into 15 subsamples based on the feature type within each subsample. The reference data for each subsample were extracted manually, and each measurement was labeled either terrain or off-terrain. Samples 1, 2, 3, and 4 represent urban areas with a raw LiDAR data density of 0.67 points/m², whereas samples 5, 6, and 7 cover rural areas with less density of the raw data; 0.18 points/m². Table I illustrates the type of feature included in each subsample.
The following equations were used to calculate two error types; the first type occurs when a filtering algorithm classifies genuine terrain measurements as off-terrain measurements. On the other hand, classifying off-terrain measurements as terrain measurements is the second type of error [53] calculated as Type 1 error (%) = N g t g × 100 Type 2 error (%) = N n t n × 100 where t g and t n are the total number of terrain and off-terrain measurements, respectively. N g is the number of the removed terrain measurements, and N n is the number of the preserved offterrain measurements [53]. Note that the higher the Type 2 error, the more the preserved off-terrain measurements. Hence, when a filtering algorithm fails in identifying and removing the offterrain points, the generated DTM will be very badly affected. Therefore, most filters focus on minimizing this type of error rather than Type 1 error (removing some terrain measurements in error), as the elevations of the removed measurements can be estimated through the DTM generation process [54]. Therefore, the AWSF algorithm aims to minimize Type 2 error to avoid the consequences of reserving high percentages of off-terrain points, which are much more serious than those of Type 1 error. Tables II and III list the results of filtering the ISPRS LiDAR dataset using the AWSF algorithm compared against the other 14 algorithms' results. The average Type 1 error using the AWSF method is 21.12%, whereas the average Type 1 error using the other filters is 15.45% and ranges from 1.84% to 39.43%. It is noticeable that Type 1 error values vary from one subsample to another. The reason behind these variations is the complexity or simplicity of the features contained in each subsample. For example, subsample 53 contains a quarry, which is a complex feature for any filtering algorithm aims to avoid leaving off-terrain points without filtering. The Type 1 error value has dropped from 53.63% for subsample 53% to 9.56% for subsample 54 as the main features in the previous subsample are small buildings, which might be considered simple and easy to identify by the filtering algorithms. Yet, the elevation values of the removed terrain points could be estimated straightforwardly throughout interpolating the surrounding values for generating the DTM. Obviously, nonfiltered off-terrain points (trees, buildings, etc.) will reduce the accuracy of the generated DTM, as they attract the surrounding values (terrain points) toward them (top of a tree, building's roof, etc.). More importantly, the results show very high accuracy in removing the off-terrain measurements from the raw data. The average Type 2 error using the AWSF algorithm is 1.44%, whereas the average Type 2 error value using the other filters is 5.72% and ranges from 1.58% to 16.11%. Also, the presented algorithm has succeeded in achieving a maximum Type 2 error value of 2.82%, which is the best (lowest value) in comparison with the other algorithms. The average value of the maximum Type 2 error of all the 14 algorithms is 17.63%. Table III illustrates clearly that the maximum Type 2 error of these algorithms ranges from 3.38% to 48.49%. In other terms, the presented filtering method produced the lowest percentage of the nonfiltered LiDAR measurements. This will significantly improve the quality of the generated DTM from the filtered data.
The error value for each subsample could be considered as an important indicator for the method's ability to deal with specific types of feature (big building, bridge, vegetation, slopes, etc.), it is imperative to evaluate the ability of any filtering algorithm for dealing with all kinds of areas and features.
Nevertheless, it is vital for judging any filtering algorithm to look at the average error value for all the subsamples as an indicator for the ability of the method to filter different kinds of areas while maintaining a low value of the error.  The lowest Type 1 error for this subsample 1.18% is achieved by applying Pingel's method. However, by comparing the values of Type 2 error for Subsample 53, it is clear that the AWSF method has minimized this value to 0.28%, whereas Pingel's method has classified 31.97% of the off-terrain measurements as terrain measurements, this will reduce the quality of the generated DTM significantly. Additionally, Table III lists that Elmqvist's method has achieved the lowest Type 2 error value for Subsample 53, which is 0.18%, but the method removed 92.45% of the terrain points (large Type 1 error). Visual comparisons of the planimetric distribution of the LiDAR points before and after the filtering process have been conducted. Fig. 5 shows that raw LiDAR data may have some areas without measurements due to either the absorption of the laser pulses by water surfaces or the unscanned areas between the flight strips [55]. These gaps in raw data may affect the performance of the filter. However, the significance of the visual comparisons comes from displaying the distribution of the removed and reserved LiDAR measurements for each ISPRS sample.
As illustrated in Fig. 5, the AWSF algorithm removed the off-terrain features such as buildings and reserved the ground points that will help in producing an accurate DTM as their distribution covers the whole terrain surface excluding the filtered buildings and other nonground features. The digital surface models (DSMs) of the raw LiDAR data along with the filtered data (DTMs) for all the ISPRS dataset subsamples are shown in Fig. 6.
The statistical and visual results confirm that the AWSF method was successful in filtering raw LiDAR data in complex land covers and difficult scenarios, such as subsample 11 (different features on steep slope), subsample 31 (disconnected terrain), subsample 41 (negative blunders), subsamples 51 and 52 (low vegetation on slopes), and subsample 71 (bridge). Nevertheless, minimizing Type 1 error along with maintaining a very low value for Type 2 error is still a challenge for filtering algorithms. To overcome this issue, further research is needed to either improve the performance of some of the current filtering algorithms or utilize the state-of-the-art machine learning/deep learning methods for enhancing the classification of raw LiDAR data points by reducing both Type 1 and Type 2 errors if possible. In the following section, the generated DTMs will be compared pixel-by-pixel against an accurate reference DTMs to examine the performance of the AWSF algorithm qualitatively. Fig. 7 shows four densely forested areas in the U.K., selected to test the AWSF algorithm performance in various types of forested zones. The specifications of the forests are stated in Table IV, they range from gentle slopes to steep mountainous areas with different tree types and different LiDAR data densities. Although airborne laser scanning technologies have advanced a great deal in the past two decades, raw LiDAR data classification in forested areas is still a very difficult task [3]. Generally, filtering raw LiDAR data for extracting the terrain surface is very useful for the forestry industry. For example, it helps in estimating the wood content in the forest after determining some forest parameters such as canopy height and trees density [10].

B. U.K. Forests Dataset
The AWSF algorithm is applied to process and filter the abovementioned forested areas. Mostly, the off-terrain measurements represent the laser pulses reflected by trees. The algorithm is designed to deal with each LiDAR measurement and classify it into either terrain measurement (ground point) or off-terrain measurement (tree point). After filtering all the data samples, DTMs are generated from the reserved terrain points and compared pixel-by-pixel against very accurate reference DTMs. The reference data are produced by the UK Environment Agency, its reported absolute vertical accuracy is ±0.05 m [56]. The error value is calculated for each point using The accuracy of the AWSF algorithm is estimated by calculating the root-mean-square error (RMSE), minimum error, maximum error, and the mean error for each sample of the forested areas [3] as Another very important comparison has been made; the four forested areas were filtered using professional software designed by the world-leading company in processing remote sensing data, Harris Geospatial Solutions. Part of the ENVI suite of image analysis tools, the ENVI LiDAR tool contains a set of advanced options to filter such forested areas. The filtering algorithm is proprietary to the Company and not published. Table V lists the statistical results of filtering the forested dataset samples using both the AWSF algorithm and ENVI LiDAR tool. The average value of the RMSE using the AWSF algorithm     Table V show the accurate filtering process of the AWSF algorithm.
The above mentioned results for both datasets indicate the robust and stable performance of the proposed algorithm in different scenarios. More specifically, the algorithm processed various densities of raw LiDAR data, starting from low density of 0.18 points/m² to very high density of 4 points/m², and various land cover, urban, rural, and forested areas, to a high degree of accuracy in comparison with the other algorithms.

C. Processing Time of Both Datasets
Processing LiDAR data is a time-consuming task due to the multistage process. This is the case not only for the AWSF algorithm but also for most of the filtering approaches that aim to maximize the reliability of the resulted data. Obviously, the processing time depends on the number of LiDAR measurements to be filtered, grid cell size, data density, etc. The AWSF algorithm's code is written and implemented using MATLAB R2019a software on a Windows 10 operating system. The hardware used is an Intel i5 processor (@3.5 Ghz) and 8 GB of RAM.  Table VI summarizes the time required for filtering all the ISPRS data samples and the U.K. forests samples using the AWSF algorithm. The grid cell size used for the ISPRS data in urban and rural areas is 2 m and 6 m, respectively, and depends on the density of the raw data. For the U.K. dataset, the grid cell size used is 2 m for Southwick, Dartmoor, and Drumtochy samples. Whereas for the very dense forest sample (Caerwent), the grid cell size is set to be 1 m. In addition to the grid cell size, the geographical extent and the amount of raw LiDAR data affect the processing time.
For example, the Caerwent forest sample area is 1 km², which is the same for both Dartmoor and Drumotchy. However, the processing time differs due to the different densities of raw LiDAR data, as the number of input points in Caerwent is four times more than the other two samples. On the other hand, the time required for filtering a similar amount of raw data for sample 2 and sample 6 is 26.56 and 50.70 s, respectively. Taking into account the difference in the grid cell size between both samples (2 m for sample 2 and 6 m for sample 6) that should reduce the processing time, but it is doubled due to the significant difference in areas. Sample 2 covers a small area of 0.26 km², whereas sample 6 covers 1.71 km². Although the AWSF method has focused on improving the quality of the filtered LiDAR data, the processing time is sufficient, as it ranges from 17 s for small areas to 3.21 min for the very dense sample (Caerwent, 4 points/m²), which covers a heavily forested area.

IV. CONCLUSION
This article presents a new algorithm for filtering raw LiDAR data to extract only the terrain measurements. The core of the proposed algorithm, AWSF, is based on interpolating raw LiDAR data using cubic smoothing splines with a predefined smoothing parameter and threshold values. Moreover, the AWSF algorithm presented, as one of its processing stages, an effective automatic approach to remove the negative blunders versus the previous manual methods [51]. After the first iteration, the AWSF allocates a weight value for each point of the raw LiDAR data. The algorithm utilizes the z-shaped function as a weight function to improve the classification process and help the algorithm in making an accurate judgment at a point level.
The performance of the algorithm is tested against two different datasets. The first one contains 15 subsamples in urban and rural areas. The accuracy of the filtered LiDAR measurements is assessed by calculating the percentages of classifying the terrain and off-terrain LiDAR measurements incorrectly. The AWSF succeeded in minimizing the percentage of the reserved offterrain points, which affect the generated DTMs badly [1], [54]. The average percentage of the reserved off-terrain points (over all samples) using the AWSF is 1.44%, which is the best (lowest) when compared against all the other 14 filtering algorithms published between 1998 and 2019; the mean of the average percentage (over all samples) of all the other algorithms is 4.47% with the lowest average being 1.96%. As heavily forested areas on steep terrain represent the most challenging scenario for all filtering algorithms, the second dataset contains four different types of dense forests. The forested areas are filtered not only using the AWSF algorithm, but also using ENVI LiDAR, which is a professional remote sensing software designed by the world-leading company, Harris Geospatial Solutions. Very accurate DTMs are used as a reference data. The filtered LiDAR data are then compared pixel-by-pixel against the reference DTMs to calculate the error value for each point and estimate the accuracy (RMSE). The accuracies of the resultant DTMs of the most difficult forest landscape (steep mountainous terrain) using both the AWSF algorithm and ENVI software are 0.48 m and 0.88 m, respectively. The abovementioned results show clearly the robustness and reliability of the proposed AWSF algorithm for raw LiDAR data filtering.