A Fast Obstacle Detection Method by Fusion of Double-Layer Region Growing Algorithm and Grid-SECOND Detector

Adjacent obstacles are difficult to be distinguished, and remote obstacles are detected easily to split. Besides, limited deep learning samples easily result in missed detection of obstacles in urban environment. In view of this, a fast and robust detection method is proposed by fusing Double-Layer Region Growth algorithm and Grid-SECOND detector. At first, SECOND detector is improved by replacing voxel grids with 2D grids and adopting multi-dimensional features to detect obstacles, which can reduce the time consumption and ensure the accurate detection of remote obstacles. Then, the first Region Growing algorithm is used to cluster the undetected and non-empty grids, which can detect obstacles outside the training set. At last, the second Region Growing algorithm is used to refine the detection results of obstacles with larger volume and multi-obstacles grids, and complete the obstacle detection. Through testing in our extracted urban dataset and KITTI dataset, it is verified that the proposed method outperforms state-of-the-art methods and can accurately achieve obstacle detection. The average duration of the entire process is about 50ms.


I. INTRODUCTION
Obstacle detection in urban environment has long been the focus of the research on unmanned driving environment perception. The commonly used sensors for obstacle detection include LIDAR, millimeter wave radar and camera. Due to the fact that the camera is greatly affected by light and cannot be used all day long, this study adopts LIDAR characterized by high measurement accuracy and strong anti-interference ability [1]. The purpose of obstacle detection based on LIDAR is to distinguish the point clouds of different obstacles from raw point clouds, and provide necessary basis for the following work such as obstacle classification [2], tracking [3] and LIDAR localization [4], [5].
In general, obstacle detection methods based on LIDAR are currently divided into two categories based on traditional clustering [6], [7] and deep learning [8], [9], respectively.
The study of traditional clustering methods is relatively early due to its strong stability and real-time performance. For example, The associate editor coordinating the review of this manuscript and approving it for publication was Zhenyu Zhou . Nurunnabi et al. [10] adopted the distance from the point to its nearest fitting plane and the change of local surface normal direction for obstacle detection. However, this method has a strong dependence on the initial cluster center, and it is difficult to distinguish adjacent obstacles. Feng et al. [11] proposed an Ordering Points To Identify The Clustering Structure (OPTICS) method which replaces the core point with the center point of the grid, and optimizes Density-Based Spatial Clustering of Applications with Noise (DBSCAN) method. Compared with traditional DBSCAN, this method has higher detection accuracy and reduces time consuming. But it is easy to detect splitting for remote obstacles. Desheng et al. [12] proposed an Eight-neighbor grids clustering method. Although this method can detect obstacles quickly, it still cannot distinguish adjacent obstacles accurately. The adaptive K-means algorithm is used by Benyue et al. [13] to detect obstacles. The method can improve the problem of Xie's work [12]. But it is still easy to detect splitting for remote obstacles and has poor real-time performance.
With the continuous development and progress of deep learning technology, the deep learning detection methods based on LIDAR have gradually begun to be widely studied VOLUME [15] had done a lot of researches on local feature extraction, and proposed a network structure PointNet++. The network adopted the idea of hierarchical feature extraction to solve the problem of uneven sampling density of point clouds. Li et al. [16] proposed an obstacle detection method based on Full Convolution Neural Network (FCN). This network can detect the obstacles through a large number of training data set without artificial target features, but it has poor real-time performance. Zhou and Tuzel [17] of Apple Company proposed an end-to-end network VoxelNet, which can improve the problem of Su's work [13]. However, it has poor performance of direction estimation. In order to obtain the accurate orientation of obstacles, Yan et al. [18] proposed a Sparsely Embedded Convolutional Detection (SECOND), which can make full use of the rich 3D information present in point cloud data. Shi et al. [19] proposed a new two-stage 3D detection framework PointRCNN with high robustness and accurate 3D detection performance. However, it is still timeconsuming. Although the above deep learning networks can solve the problem of sparsity of point cloud and realize the accurate detection of remote obstacles, it is difficult to detect the obstacles outside the training dataset. In order to improve the detection accuracy and solve the problems suffered by the above methods, a fast and robust obstacle detection method is proposed in this paper. Firstly, the point clouds after road segmentation are rasterized and the grids in Region Of Interest (ROI) are filtered. Secondly, SECOND detector is improved by replacing voxel grids with 2D grids and adopting multi-dimensional features to detect obstacles, which can solve the problem of remote obstacle detection splitting. After that, the first Region Growing algorithm is adopted to cluster the undetected and non-empty grids, which can solve the missing detection problem caused by the limited dataset. Finally, the second Region Growing algorithm is adopted to refine the obstacles with larger volume and multi-obstacles grids detected by the first Region Growing algorithm, and effectively distinguishes the adjacent obstacles.
The main contributions of this paper are summarized as follows.
1. Grid-SECOND detector is proposed to improve SECOND detector by adopting multi-dimensional features and replacing voxel grids with 2D grids. The Grid-SECOND detector can reduce the time consumption and improve the original detection accuracy.
2. Double-Layer Region Growing Algorithm is proposed to solve the problem of difficult to distinguish adjacent obstacles, while ensuring stable detection of obstacles.
3. This paper adopts Double-Layer Region Growing Algorithm to supplement the Grid-SECOND detector and solve the problem of obstacles outside the training set undetected.
The remainder of the paper is structured as follows. Point cloud data preprocessing is in Section II. In Section III, we respectively introduce three methods: Double-Layer Region Growing, Grid-SECOND detector, and a fast detection method by fusion of Double-Layer Region Growing algorithm and Grid-SECOND detector. Section IV conducts experiments on our extracted urban dataset and KITTI data set respectively and analyses the results. At last, Section V presents some conclusions and challenges.

II. DATA PREPROCESSING
HDL-64 LIDAR is adopted in this paper to improve the ability of obstacle detection in urban environment, which can generate nearly 130000 points per frame. If the point clouds are processed directly, it is difficult to guarantee realtime performance. Therefore, to improve the detection speed and accuracy, this study adopts the method utilized by Jiong et al. [20] to complete the road segmentation, as shown in Figure 1. The blue points represent the ground points and the red points represent the obstacle points. Then, we adopt a 40 cm×40cm grid to rasterize the point clouds, thus creating a 200 × 200 grid occupancy map, as shown in Figure 2. The center point of the grid is then transformed from LIDAR coordinate to Geodetic Coordinate System(GCS) using equation (1). At last, we judge whether the center of grid is in the high-precision map composed of quadrilateral by equation (2), and complete ROI filtering.
where, P t is coordinate values of grid center in GCS. P 1 , P 2 , P 3 , P 4 denote quadrilateral four vertices, respectively. T t , R t denote translation matrix and rotation matrix from LIDAR coordinate to GCS at time t,respectively.  The production process of high-precision map is as follows: first of all, the method of Dongbin et al. [21] is used to complete reflectivity calibration of LIDAR. After that, the method of Wen et al. [22] is adopted to overlay ground point cloud. Next, Josm map editing software is used to plot a high-precision map [22] on the superimposed gray-scale map, as shown in Figure 3.

A. DOUBLE-LAYER REGION GROWING
As an image segmentation algorithm, Region growing algorithm can segment the connected regions with the same features, thus ensuring good edge information. Although this algorithm is simple and fast, it may easily lead to over-segmentation of obstacles when the grid is smaller, and it is difficult to distinguish adjacent obstacles when the grid is larger. Therefore, a Double-Layer Region Growing algorithm is proposed in this paper, which first uses larger grids to avoid obstacle over-segmentation, and then uses smaller grids to select obstacle that are characterized by large volume and  may contain multiple targets for refinement. The structure of the algorithm is shown in Figure 4 and 5.
Step 1: Firstly, traversal flag of the grid F visit is initialized to 0. Then, equation (3), (4) and (5) are used to calculate the height difference h, barycenter b i and variance value σ i for each grid in ROI. The grid with h greater than height difference threshold T h (T h = 0.4) is selected as the seed grid. The grid with σ i greater than the variance threshold where, h max , h min denote the maximum and minimum height of points in the grid, respectively. p j = p jx , p jy , b i = b ix , b iy denote point j and the barycenter in the grid i, respectively. σ i is the variance of points in grid i.
Step 2: 8-neighbor grids extended clustering. We assume that grid A in Figure 6 is the seed grid. The 8-neighbor grids of the grid A are searched counterclockwise, as shown in Figure  6. The grid whose F visit is 0 is processed as follows: 1) If h > T h in grid B, the grid is an obstacle grid. In addition, if the maximum height difference between grid B VOLUME 9, 2021 and seed grid A is less than the height difference threshold of the same obstacle T ch (T ch = 0.3), grid A and grid B belong to the same obstacle. They are thus added to the same obstacle list and F visit of grid B is 1. Otherwise, grid B is skipped and F visit is unchanged. 2 After traversing the 8-neighbor grids in turn, the succeeding grid of the current seed grid in the obstacle list is used as the new seed grid for 8-neighbor search. All the grids are traversed to complete the obstacle detection.
Step 3: Obstacle refinement. If the clustered obstacles have large volume and multi-obstacles marking grids to satisfy the requirements of equation (6), 20cm×20cm grid and T ch = 0.15 are used to perform Step 2 again to refine them. ob g_n ≥ T g_n ob m_n ≥ T m_n (6) where, ob g_n , ob m_n are the number of grids and the number of grids marked as multi-obstacles, respectively. T g_n , T m_n are thresholds for number of grids requiring second clustering and number of multi-obstacle grids (T g_n = 15, T m_n = 4), respectively B. GRID-SECOND DETECTOR SECOND detector is a derivative of VoxelNet, which can solve the problem of sparse point cloud of remote obstacles, and realize the stable detection of obstacles without splitting. Inspired by the SECOND Detector, a Grid-SECONG detector is proposed in this paper on the basis of its main structure. The Grid-SECONG uses multi-dimensional features and replaces voxel grids with 2D grids to improve SECOND detector, thus improving the original detection accuracy and reducing the time consumption.
Step 1: We first adopt the method of data preprocessing to divide the point clouds into grids with 40 cm. Then, mean heigh h mean , maximum height h max , mean intensity I mean , maximum intensity I max , the number of point clouds N count , the angle A dir of the center point relative to the origin, the distance D ctr between the center point and the origin, and the occupation flag F non per grid are calculated. At last, these multi-dimensional features of the grid are input to the network as 8 channels for gridlwise feature extraction.
Step 2: We treat the 2D grid as a voxel grid with the height of 1. Based on the voxel feature encoding (VFE) layer as described in reference 17, we adjust the form of input tensor to construct grid feature encoding (GFE) for extracting gridwise features. A GFE layer takes only one tensor with 8 channels in the same grid as the input point, and uses a fully connected network (FCN) consisting of a linear layer, a batch normalization (BatchNorm) layer and a rectified linear unit (ReLU) layer to extract gridwise features. Then, it uses elementwise max pooling to obtain the locally aggregated features for each grid. As a whole, the gridwise feature extractor consists of several GFE layers and an FCN layer.
Step 3: Sparse Conv Layers, as described in reference 18 is adopted in this paper to increase the speed of both training and inference. It consists of two phases of sparse convolution. Each phase contains several submanifold convolutional layers and one normal sparse convolution to perform downsampling in the z-axis. When the z-dimensionality is downsampled to one or two, the sparse 3D data will be converted into a 2D BEV image.
Step 4: We adopt an RPN architecture composed of three stages in reference 18. Each stage is followed by several downsampled convolutional layers. After each convolutional layer, BatchNorm and ReLU layers are applied. Then, we upsample the output of each stage to a feature map of the same size and concatenate these feature maps into one feature map. Finally, three 1 × 1 convolutions are applied for the prediction of class, regression offsets and direction.

C. A FAST DETECTION METHOD BY FUSION OF DOUBLE-LAYER REGION GROWING AND GRID-SECOND DETECTOR
Though Double-Layer Region Growing algorithm can quickly distinguish adjacent obstacles, it is easy to detect splitting for remote obstacles. Grid-SECOND detector adopts the sparse distribution of point clouds for training, and can solve the problem of remote obstacle detection splitting. However, due to the limitation of training obstacle types, when the obstacles are close or not involved in training set, these obstacles are easily missed. This paper combines the above two detection methods, and proposes a more robust and faster detection method. The structure of the method is shown in Figure 8. In addition, the specific steps are as follows.
Step 1: To start, the preprocessed grids are processed as follows according to the steps of the Grid-SECOND detector: Calculate 8 channels features per grid; Extract grid features; Convert sparse 3D data into a 2D BEV image; Apply RPN for  the prediction of class, regression offsets and direction. Then, the first obstacle detection is completed.
Step 2: Next, considering that there may be adjacent obstacles or obstacles that are not in the training set in urban environment, the Grid-SECOND detector may fail to detect them. Therefore, the non-empty and undetected grids are selected from the grid map in the Step1, and the Double-Layer Region Growing algorithm is used to cluster these grids according to the steps of III-A. Finally, the obstacles which aren't detected by Grid-SECOND is detected by twice Regional Growth algorithm.

IV. EXPERIMENT
At first, according to the loss function definition and training method in reference 18, this paper uses KITTI data set which contains 7481 training point clouds to complete the Grid-SECOND detector model training. Then, combined with the Double-Layer Region Growing algorithm, the following experimental tests are carried out.
This experiment consists of two parts: Part A is the experiment on typical urban dynamic. LiDAR data is collected by Velodyne HDL-64E. It includes adjacent obstacles, remote obstacles and obstacles that are not involved in the training set. Part B is the experiment in the public test dataset provided by KITTI [23]. During both experiment parts, the methods for processing the LiDAR data are run on a computer with i7-8700 processor, RTX2070 graphics card and 16G RAM. The system environment is ubuntu16.04 and the development environment is QtCreator 5.8. Besides, the software is written by C++.

A. EXPERIMENT ON URBAN DYNAMIC SCENES
In this paper, the obstacle with more than 15 grids (40cm×40cm) is defined as large obstacle. Then, 5000 frames are collected from the urban dynamic environment and are marked manually, including 15036 large obstacles. This paper uses Region Growing algorithm [12] and SECOND [18] detector to process the 5000 frames, respectively and compares the detection results with the proposed method.

1) QUALITATIVE ANALYSIS
In order to illustrate the performance of the proposed method, we selected three typical urban dynamic scenes for the analysis. The raw point clouds of the three scenes are shown in Figure 9(a), Figure 11(a) and Figure 13(a). The point clouds after ROI filtering of the three scenes are shown in Figure 9(b), Figure 11(b) and Figure 13(b). The detection results of the three scenes corresponding to the three methods are shown in Figure 10, Figure 12 and Figure14 respectively. Different obstacles detected by the SECOND detector and the Grid-SECOND detector are represented by different colors, such as purple for vehicles, red for unknown, and blue for pedestrians. The obstacles detected by the Region Growing algorithm is randomly colored.
Scene A1 contains a remote vehicle as shown in Figure 9 (A box), and local 3D display of A box is shown in Figure 9(c). Since the vehicle is far from the LIDAR, the point VOLUME 9, 2021  clouds at the front of the vehicle are dense and the point clouds at the top of the vehicle are sparse. The SECOND [18] detector and the proposed method use sparse point clouds for training, and can accurately detect the complete vehicle shape. However, due to small height difference and small number of the point cloud at the top of the vehicle, the vehicle  is detected split by Region Growing algorithm [12] and only the front part is detected, as shown in Figure 10(a).
Scene A2 contains two adjacent bicycles, as shown in Figure 11(A box), and local 3D display of A box is shown VOLUME 9, 2021 in Figure 11(c). Since the two bicycles are similar in shape and close together, the features interfere with each other and the SECOND [18] detector cannot detect them as shown in Figure 12(b). Although Region Growing algorithm [12] can detect the adjacent obstacles, it cannot refine and distinguish them due to the large grid(40cm). The proposed method uses the second Region Growth algorithm to refine the adjacent obstacles detected by the first Region Growth algorithm, and successfully distinguishes them as shown in Figure 12(c).
Scene A3 contains a big truck. When it runs to the side of the Intelligent Vehicle, the LIDAR can only scan the side of the big truck. The shape of point cloud looks like a wall that is not involved in training set, as shown in Figure 13(A box), and local 3D display of A box is shown in Figure 13(c). At this time, the SECOND [18] detector cannot detect the big truck, as shown in Figure 14(b). In contrast, the Region Growing algorithm [12] with large grid (40cm) can detect the big truck as shown in Figure 14(a). The proposed method can also accurately detect the big trunk through the first Region Growth algorithm with large grid (40cm) as shown in Figure 14(c). Given that the point cloud at the junction of the front and body of the trunk is dense, the covariance of the grid is small. Then, the big trunk cannot meet the conditions of using the second Region Growth algorithm with small grid (20cm), avoided false detection.

2) QUANTITATIVE ANALYSIS
In order to further verify the accuracy and robustness of the method, the detection results of obstacles in the 5000 frames are evaluated by positive detection rate P tp , false detection rate P fp and missed detection rate P md [24] according to equation (7).      P tp = N tp /N sum P fp = N fg + N fs /N sum P md = N md /N sum (7) where, N sum is the total number of obstacles. N tp is the number of correctly detected obstacles. N fg is the number of ground point detected as obstacles. N fs is the number of detected split or incorrectly distinguished obstacles and N md is the number of undetected obstacles. Therefore, the higher P tp , the better the detection effect. The higher the P fp and P md , the poor the detection effect. Table 1 lists respectively the overall detection results of the three methods. In the urban dynamic environment, the detection results of the proposed method for different obstacles are obviously better than those of the other two methods. Firstly, the method in reference 12 is difficult to distinguish adjacent objects, therefore, P tp is relatively low. Due to the fact that the method is also greatly affected by height difference, it is easy to detect splitting for remote obstacles, and the P fp is relatively high. Then, the method in reference 18 is not affected by the height difference, but it is still difficult to detect objects that are not in the training set. Thus, the P md is relatively high. The proposed method combines Double-Layer Region Growing algorithm and Grid-SECOND detector to effectively distinguish adjacent obstacles, solve the splitting in remote obstacle detection, and realize the detection of the obstacles that are not involved in training set. Compared with the methods in reference 12 and reference 18, the overall detection accuracy is increased by about 12%.

B. EXPERIMENT ON KITTI DATASET
We evaluate the proposed method on the KITTI 3D object detection test set [23] which contains 7518 test point clouds, covering three categories: Car, Pedestrian, and Cyclist. For each class, detection outcomes are evaluated based on three difficulty levels: easy, moderate, and hard, which are determined according to the object size, occlusion state, and truncation level. We compare the proposed method with stateof-the-art methods of 3D object detection. Among them, for obstacles detected by the Region Growing algorithm under different types, this paper only need to manually judges whether the detection is correct to obtain the detection accuracy.
The performance of these 3D detectors on KITTI test set is shown in Table 2. Our method is superior to the state-of-the-art methods. Although PointNet++ uses a 2D detector that has been fine-tuned using ImageNet weights to achieve better results in Pedestrian detection, our network is trained from scratch and uses only LiDAR data. For car and cyclist detection, PointRCNN outperforms SECOND. While for the pedestrian detection, SECOND outperforms PointRCNN. In our method, SECOND detector is optimized based on the grid, and the Double-Layer Region Growth algorithm is used to solve the problem of the missed detection by Grid-SECOND detector. Therefore, our proposed method achieves better results than PointRCNN and SECOND.
The average time cost of these different methods on the KITTI test set is shown in Figure 15. Because Vox-elNet operates directly on sparse 3D points and convolutional middle layers is too complicated, the time-consuming of VoxelNet is the longest, and the inference time is 230ms. However, the PointRCNN and VeloFCN are relatively less time-consuming, and take about 80ms. Since the   SECOND detector uses improved sparse convolution, the time-consuming of the SECOND detector is the shortest, and the inference time is 50ms. We adopt 2D grids instead of voxel grids to reduce the time-consuming of the SEC-OND detector, and fuse the Double-Layer Region Growth algorithm to increase the overall detection accuracy of the proposed method, while ensuring that the total time remain is equivalent to that of the SECOND detector.
In order to further analyze the orientation accuracy of obstacle, we select Car category with higher orientation sensitivity for analysis. The performance comparison of average orientation similarity (AOS) on KITTI test set for the Car class is shown in Table 3. This paper extracts multiple features of different dimensions in the same grid as input, which is convenient for obtaining the 3D box proposals. Therefore, compared with SECOND detector that extracts point features separately, our method can obtain the obstacle orientation more accurately.

V. CONCLUSION
This paper proposes a fast and robust obstacle detection method that combines Double-Layer Region Growing algorithm and Grid-SECOND Detector. At first, this method adopts Grid-SECOND detector that replaces voxel grids with 2D grids and adopts multi-dimensional features to improve SECOND detector for obstacle detection. Then, the first Region Growing algorithm is used to cluster the remaining undetected and non-empty grids. Finally, the second Region Growth algorithm is used to refine the obstacles that are characterized by large volume and may contain multiple targets. It effectively distinguishes the adjacent obstacles and solves the splitting of remote obstacle detection, while making up for the missing detection problem caused by the limited training set.
The proposed method is verified in our extracted urban dataset and the KITTI dataset, respectively. The experiment results on urban dataset show that the proposed method is significantly better than Region Growing algorithm [12] and SECOND [18] detector for detecting remote obstacles, adjacent obstacles and obstacles outside the training set. The experiment results on KITTI dataset show that the proposed method outperforms other state-of-the-art methods in AP and AOS. Additionally, the proposed method adopts Grid-SECOND detector to reduce time-consuming and guarantees that the time cost of the proposed method after fusing Double-Layer Region Growing algorithm is similar to the SECOND detector, while increasing the overall obstacle detection accuracy. The overall method takes about 50ms and meets the real-time requirements of intelligent vehicle, which can provide more reliable obstacle information for the perception system.
Although the proposed method can distinguish adjacent obstacles by means of the second Region Growing algorithm with small grid size, it cannot accurately distinguish two obstacles with overlapping height. In the next step of work, we will divide the point cloud of obstacle with large volume into voxel grids, and use Region Growing algorithm for the voxel grids. It can save time cost while improving the refinement accuracy of large obstacles.