Background Filtering and Object Detection With a Stationary LiDAR Using a Layer-Based Method

The connected vehicle environment is significant for the future road network. For constructing the connected vehicle environment, real-time data acquirement is always the prerequisite. Recently, using Light Detection and Ranging (LiDAR)-based roadside infrastructures are becoming a prevalent method of obtaining real-time traffic data. However, the collected raw data from LiDAR cannot usually be used directly. The steps of data processing, like background filtering and object detection, are necessary. The processed data can then be employed in different applications. This paper proposed a novel layer-based searching method that is established with the help of the point distribution features to distinguish moving objects from the point cloud. It aimed to address the unexpected influence of factors such as congested situations and package loss. The new approach was also evaluated compared with the state-of-the-art methods by applying field data. The results showed that the proposed method is more effective than other methods. This method may be applicable to other types of rotating LiDAR for improving the background filtering performance.


I. INTRODUCTION
Light Detection and Ranging (LiDAR) is capable of measuring distances to objects by emitting and detecting the lasers. Compared to the videos/cameras, LiDAR is more reliable under multiple light conditions [1]. Based on different installation forms, the LiDAR can generally be divided into two types: mobile LiDAR (airborne LiDAR and carmounted LiDAR) and stationary LiDAR [2], [3]. The stationary LiDAR has recently attracted much attention from researchers and traffic engineers. One primary function of stationary LiDAR is to generate high-resolution micro traffic data (HRMTD) [3]. Specifically, in transportation engineering, the HRMTD means the trajectories of road-users with high frequency (acquisition frequency >10 Hz) [4], [5]. Prevalent traffic applications such as connected-vehicles, near-crash analysis, and dynamic traffic signal timing require the HRMTD as input [6], [7]. Compared to the mobile The associate editor coordinating the review of this manuscript and approving it for publication was Feng Xia . LiDAR, the advantage of the stationary LiDAR is attributed to a fixed location. Since most background points under stationary LiDAR can be found in consecutive frames, it allows historical information to be used for background identification. With this feature, the computational load can greatly be decreased compared to the mobile LiDAR.
After data collection, data processing is the next step. Currently, an important application towards the collected data is to identify the objects. Though existing studies related to the object detection have attempted to use the roadside LiDAR data, those studies experienced several drawbacks under the following special conditions: 1) congested situations, 2) package loss, and 3) points drift [8], [9]. Congested situations indicate that the moving objects stop at some locations for a period [10]. The package loss means that some points are missing in the space caused by network overload or unstable connection [11]. Points drift means that the points representing the same stationary object have different coordinates at different frames [12]. Previous research often directly deployed density-based algorithms under these situations to filter the background and detect objects; however, the existing density-based method may misidentify moving objects as background points (irrelevant points). For this limitation, it is necessary to develop a more advanced method to exclude background points and identify the objects from the roadside LiDAR data.
LiDAR can emit multiple laser beams at a time; each laser beam has a different horizontal angle. Different layers refer to different laser beams (ID). The idea is that it may be possible to improve the background filtering and object detection from data in different layers. Therefore, the authors proposed a novel layer-based method to remove the background and used rotating LiDAR for HRMTD collection [13]. The presented layer-based method can directly identify the background points by analyzing the distance distribution of points in the horizontal angles of each layer. Further, the rest points can be applied by a density-based spatial clustering method (DBSCAN) for object detection.
The remainder of this paper is organized as follows. Section 2 provides a review of related studies. Section 3 comprehensively introduces the roadside LiDAR and elaborates the points distribution feature generated by LiDAR scanning. Section 4 documents the details of the method for background filtering and point clustering. Section 5 evaluates the performance of the proposed method through case studies. Section 6 summarizes the findings of this paper and its future extension.

II. RELATED WORK
The current background filtering approaches for stationary LiDAR can be roughly divided into three groups: rule-based method, volumetric-based method, and point-based method.
The rule-based method is the most prevalent way to filter the background. By this method, the frame without any road-users in the space is usually selected as a target frame. Then the points in the target frame are defined as background points. For the other frames, points located within a pre-defined distance to the points in the target frame are considered as background points and are excluded from the raw LiDAR data. There have been several studies using the rule-based method to achieve background filtering. Zhang et al. [14] manually selected one frame without any moving object as a target frame. Points would directly be excluded from the space if their distances to other points in the target frame were less than the threshold. Lee and Coifman [15] developed a similar background subtraction method by giving a pre-defined range to the distance in each detected angle and each laser beam. By aggregating multiple target frames, the acceptable range of the distance regarding the background points in each angle could be obtained. The testing results showed that more than 90% of background points could successfully be excluded from the space. However, the major limitation of the rule-based methods is that it is difficult to obtain the target frame with non-moving objects under heavy traffic conditions [15].
The volumetric-based method converts the LiDAR data into the volumetric space with the rasterization procedure, which is also called the raster-based method [16]. The volumetric-based method usually divides the space into small cubes after aggregating multiple frames. The point density in each cube can then be calculated. Since the density of background points is higher than that of the moving object points after aggregation, the cube can be distinguished into background cube or non-background cube by giving a pre-defined density threshold. Wu et al. [18] firstly provided a fixed threshold of point density in the cube for background filtering based on their experience. A dynamic threshold was then developed by considering the point distribution and the mechanical properties of the LiDAR, namely, three-dimensional density-spatial filtering (3D-DSF). The major drawback of 3D-DSF is that the vehicles under congested situations are easily misidentified as background points. Lv et al. [16] used the change of point density in the cube at different frames to distinguish background points and non-background points. They considered any cube with a change of point density higher than 2 points in two adjacent frames as the non-background. Results show that more than 98% of background points can be excluded from their experiments. However, those methods did not consider the packet loss issue in their algorithms. As a result, the background points and non-background points might be misclassified under packet loss situations.
The point-based method identifies background points directly based on point distribution without any conversion. The computational load is lower than other methods since the points are stored in the raw LiDAR data and easily accessed. Qi et al. [19] used a changeable dimension strategy to store the feature of the points and applied the neural network (NN) to train a classifier to identify background points from the raw LiDAR data. However, the changeable-dimension method needs to be further verified since there is still a lack of systematic introduction to this method's practical application. Zhang et al. [20] developed a laser beam-based method for background filtering using roadside LiDAR. It was found that the existence of the moving objects can shorten the distance between the detected points and the LiDAR compared to the situation that no moving objects exist in the space. The previous studies found that the location of the points may drift from the actual location of the object due to the environmental factors or the mechanical properties of the LiDAR sensors [21]. The beam-based method developed by Zhang et al. [20] still has the point drift issue or the packet loss issue, and the performance of this method decreased dramatically under the packet loss issue. From the above-mentioned studies, we can find that the existing methods could not effectively subtract the background points yet under complex situations such as congested traffic, package loss, and points drift situations. A more effective method for background subtraction is needed.
Theoretically, data points in the same group should have similar properties or features, while data points in different groups should have different properties or features [22]. Based on the above characteristics, the points belonging to one object can be grouped into a cluster. The current widely used methods for points clustering include K-means, mean-shift clustering, Gaussian Mixture Models (GMM), and DBSCAN [23], [24]. K-means method usually selects a number of classes and initializes their respective center points. Each data point is classified by computing the distance between that point and each group center. The point can be classified into the point group of the nearest center point. According to the initial classification result, the group center is recalculated by the mean of all the vectors in the group [25]. By repeating the above-mentioned steps, the group centers can eventually become stable. The biggest disadvantage of K-means is that the number of groups needs to be known in advance. The accuracy of the clustering result is significantly influenced by the initially selected points. Mean-shift clustering that attempts to find dense areas of data points is a slidingwindow-based algorithm. It is a centroid-based algorithm to locate the center points of each group. It keeps updating the center point to be the mean of the points within the slidingwindow. The mean-shift clustering algorithm starts with a circular sliding window centered at a randomly selected point with a radius r as the kernel [26]. Then the algorithm updates the kernel iteratively to a higher density region using a hill-climbing method until convergence. The sliding window proceeds continuously according to the mean until no more points to be updated inside the kernel. In contrast to K-means clustering, there is no need to select the number of clusters as the mean-shift method can automatically discover it. However, the shortcoming of the mean-shift clustering is the low computation speed caused by the sliding window. In GMMs, the data points are assumed to be Gaussian distributed. Two indexes are used to describe the shape of the cluster: the mean and the standard deviation [27]. An optimization algorithm called Expectation-Maximization (EM) is applied to determine the parameters of the Gaussian for each cluster. The Gaussian distribution parameters for each cluster are randomly initialized. The probability of each data point belongs to a cluster can be computed. The closer a point attaches to the Gaussian's center, the more likely it belongs to that cluster. Based on these probabilities, the probabilities of data points within the clusters can be maximized. These parameters using a weighted sum of the data point positions, can then be computed. GMMs are also flexible in terms of cluster covariance. Due to the use of the standard deviation parameter, the clusters can be formed in an ellipse shape, rather than being restricted to circles. The major limitation of the GMM algorithm is the heavy computational load; it is ineffective if the dimensionality of the problem exceeds a threshold. Another disadvantage of the GMM algorithm is that the user needs to set the number of mixture models to satisfy the training dataset. However, in many cases, users do not know how many mixture models should be used. Users may have to generate a number of different mixture models to find the most suitable model set that works for their classification problem. DBSCAN employs two important parameters: epsilon (Eps) and minimum points (MinPts) for clustering points [28]. Eps represents the radius of the neighborhood, and MinPts is the minimum number of neighbors within Eps. For a point, if the number of its neighbor is greater than or equal to MinPts, this point is marked as a core point. If the number of its neighbor is less than MinPts, but the point belongs to the neighbor of other core points, this point is marked as a border point. If the point does not belong to the core point and border point, this point is called a noise point [29]. Since the LiDAR can provide a higher density for the moving objects after background filtering and the number of moving objects is unknown, the DBSCAN can be an ideal method for LiDAR point density. Several studies have been conducted using different approaches of DBSCAN for points clustering. Cui et al. [17] firstly applied the traditional DBSCAN for roadside LiDAR point clustering. It was found that a fixed Eps and MinPts usually could not successfully cluster the points that are far away from the sensor. Chen et al. [28] developed a revised DBSCAN for deer identification using the roadside LiDAR. The error rate was less than 8% in their practice. Nevertheless, the time delay for computation remained a problem in collecting real-time HRTMD [29]. Zhao et al. [30] developed a revised DBSCAN algorithm by deploying adaptive parameters considering the point distribution in the space and the mechanical features of the LiDAR sensor. It was found that the revised DBSCAN can greatly improve the clustering accuracy compared to the traditional DBSCAN. It should be noted that the DBSCAN suffers the major limitation of the high computational load. Efficiency improvement of clustering is still needed to reduce the time delay further.

III. DATA
This paper employed RS-LiDAR-32 for data collection and algorithm validation. The RS-LiDAR-32 sensor uses 32 infrared (IR) lasers along with IR detectors to measure distances to objects. The device is usually installed on a compact and weather-resistant cabinet. The LiDAR rotates rapidly within its fixed housing to scan the surrounding environment and provide a set of real-time 3D point data. The whole sensing system includes desktop/laptop computers, GPS Antenna (optional), interface boxes, LiDAR sensors, and a DC power supply. The time-of-flight (ToF) methodology is applied in the RS-LiDAR-32. When each IR laser emits a laser pulse, its time-of-shooting and direction are registered. The laser pulse may hit an object, which reflects some of the energy back to the LiDAR. A portion of that energy is received by the paired IR detector, registered as the time-of-acquisition and received power. The sensor's rotation revolutions can be 300 Per Minute (RPM), 600RPM, or 1200 RPM. A two-byte azimuth value (α) appears after the flag bytes at the beginning of each data block. The azimuth is indicated by an unsigned integer. It represents an angle in centesimal measurement. For instance, a raw value of 27742 should be interpreted as 277.42 • . Only one azimuth value is reported per data block. The major features of RS-LiDAR-32 are summarized in Table 1.   The LiDAR can temporarily be mounted on a tripod for data collection or on a permanently fixed location such as a traffic signal pole for long-term data collection. As for roadside LiDAR implementation, the recommended height is 7-9ft above the ground, considering the horizontal field of view (FOV) and the occlusion issue. Figure 1 shows an example of the RS-LiDAR-32 and its customized carrier. Figure 2 shows two different data collection methods.
To better express the point distribution feature of the roadside LiDAR, two adjacent frames (Frame 1 and Frame 2) without moving objects were selected, and their distance differences of the scanned points were compared at each same horizontal angle. Figure 3 shows the distance offset distribution in the 360-degree horizontal FOV with different laser ID.
It is shown that for most background points, the offset for one point between two frames was insignificant. But for some laser beams such as ID 17, 20, and 24, there were a lot of randomly located points with large distance offset. By further checking the LiDAR video, it was found that those random points were purely caused by the noise or the dynamic background (such as swinging branches of the trees). To identify the influence of moving objects in the space on the points distribution, a frame without moving objects in the laser beam and another frame with moving objects in the same laser beam were selected for comparison. The point distribution is shown  in Figure 4. From the figure, it can be seen that the detected distances to moving objects were less than the distances to the background (without moving objects) along the direction of the laser beam.
A further investigation examined the distance offset patterns between the moving objects and the background objects in the space. As shown in Figure 5 (b) and (c), when moving objects were scanned by the laser beam, the distance offset was continuous among some adjacent angles. While for fixed or dynamic background, the distance offset patterns appeared as flatten lines or random fluctuation lines. Therefore, the point distribution patterns were different between moving objects and background objects in the space. Generally, the point distribution feature of moving objects is ''curve sinking'' (because the moving objects are close to the LiDAR) and ''continuous'' compared with background objects. This paper used this feature to distinguish moving objects and background points.

IV. METHODOLOGY
This section introduces a novel method for background filtering and points clustering. This method can exclude background points and identify the object simultaneously without needing to distinguish background filtering and point clustering strictly.
As mentioned before, the point offset distribution is a major feature used to distinguish background and non-background. The normal distance offset caused by vibration is illustrated as ϒ. ϒ is determined by the distance resolution of the LiDAR. The number of continuous points (NCP) with offsets higher than ϒ is another factor to be considered. A low value of NCP may result in misidentifying the dynamic background points as moving object points under point drifting conditions. A high value of NCP may result in misidentifying some small size moving objects (such as pedestrians) as background points. For RS-LiDAR-32, ϒ is 3 cm (representing an offset with ±3cm), meaning that if the offset of one point at two adjacent frames is less than 6 cm, the point will be identified as a dynamic background point. As for NCP, 3 was selected as a threshold based on Lv et al.'s analysis [16]. Therefore, a moving object should meet the following criteria-Equation (1) indicating VOLUME 8, 2020 FIGURE 5. Distance offset distribution along the horizontal angle: (a) Distance offset between two frames without moving objects, (b) Distance offset between one frame with one moving object and one frame without moving objects, (c) Distance offset between one frame with six moving objects and one frame without moving objects.
offsets of a consecutive 3 points larger than 6cm.
where θ is a horizontal angle at one layer, and ϒ is the offset. However, only checking NCP at one layer may miss some  critical information. For example, assuming 5 continuous points exist in one layer from an angle to + 0.08 degree, all points will be identified as moving objects (because the NCP = 5 > 3). At its adjacent layer, only 2 continuous points (NCP = 2 < 3) belong to the same moving object. The reduced NCP may be as a result of occlusion, package loss, or other issues. At another layer, only one point (NCP = 1 < 3) belongs to the same moving object, as shown in Figure 6. If the method in (1) is deployed, those points at layer b and c will be identified as background points. Another special case is that the NCP value of each layer (a, b, c) is 2, as shown in Figure 7. This NCP distribution pattern can be regarded as a pedestrian with a larger height than width. By the method in Equation (1), the moving object points (pedestrian) are misidentified as background since NCP = 2 < 3 at each layer.
It should be noticed that the layer a, b, and c represent three adjacent laser beams, the dots in each layer shown in Figure 6 and 7 represent consecutive points with an offset greater than ϒ. As can be seen from Figures 6 and 7, the number of points at the same angle in adjacent layers (NPAL) should also be considered as a factor for background filtering. Due to occlusion or point drift, the points representing the same object in adjacent layers may not be captured at the same angle. Therefore, the critical problem is how to determine the neighbors of one point. Searching the neighbors in one layer is infeasible. To solve this problem, a layer-based searching method was invented. The details of the method can be illustrated in Figure 8. The triangles in Figure 8 are dummy points serving as the bridges. If two points can be connected through the bridge, they can be considered as the points in NPAL even they are not at the same angle. Therefore, for one point in one layer, the point can find eight neighbors at most.
The resolution of LiDAR decreases with the increasing distance between the object and the sensor. The searching strategy in Figure 8 can further be illustrated in (2).
where n i A is the number of points in point-A's neighbors at layer i.
The DBSCAN is applied for point clustering as follows. The specific application about the DBSCAN method refers to Zhao et al. [30]. For one point-A, if it meets (2), then A and its neighbors will be considered as a core point. All core points will be included in one group. With this strategy, all points in Figures 6 and 7 will be grouped correctly. Since all the searching points should meet the threshold that ϒ >= 6, the number of points used from raw LiDAR data as input of DBSCAN was greatly reduced, which can significantly increase the computation efficiency.

V. EVALUATION
Data collected at three sites under different scenarios were used as examples to illustrate the performance of the proposed layer-based algorithm. Those sites include a freeway segment, an intersection, and a rural arterial segment. The major features of the three sites are shown in Table 2. Figure 9 shows the results of a frame before and after data processing on a freeway segment (Site 1). There were ten vehicles in the selected frame. After applying the layer-based method, all ten vehicles were kept in the space, and background points were successfully excluded. The point clouds were plotted in the Cartesian coordinates. Figure 10 shows the results of a frame before and after data processing at a signalized intersection (site 2). There were six vehicles at different directions in the selected frame, representing a congested situation. After applying the layer-based method, all six vehicles were kept in the space, and all background points were successfully excluded.   Figure 11 shows the results of a frame before and after data processing at a rural arterial segment (site 3). Seven vehicles were detected in the selected frame. After applying the layer-based method, all vehicles were kept in the space, and background points were successfully excluded.
To further evaluate the performance of the developed method, the layer-based algorithm was compared to the stateof-the-art methods. The density-based method (3D-DSF proposed by Wu et al.) [18] and the raster-based method (RA developed by Lv et al.) [16] were used for comparison. The results are shown in Table 3. It was found that the proposed layer-based method (LB) performs better than the other two methods at the three sites. The computational time of the algorithm is important for traffic behavior and safety analysis [31]. Therefore, the filtering consumed time was also compared between this method and the 3D-DSF method. After further statistical analysis, the filtering speed of this method is only about 3% slower than the DSF method and can meet the actual real-time requirement. It should be noted that the 3D-DSF needs more pre-processing time compared with this LB method.

VI. CONCLUSION
This paper developed a new method to exclude background points and identify the objects from the roadside LiDAR data. The proposed method used the point distribution pattern at different layers as a novel feature for moving object identification. Based on the feature, a layer-based searching method is then proposed to address the unexpected influence of factors such as congested situations and package loss. The evaluation showed that the proposed method could provide higher accuracy compared to the state-of-the-art methods. The presented method can keep more than 96% of moving object points and effectively excluded all the background points from the space. The computational time of the proposed method can meet the actual application needs (only about 3% slower than 3D-DSF). Furthermore, it can also be extended applied to other types of rotating LiDAR after properly calibrating the parameters.
More data are required to further investigate the performance of the developed method. The computation time of the proposed method needs to be recorded and quantized. The next step of the research is to analyze the influence of different weather conditions on the effectiveness of the proposed method.