Omni-Directional Obstacle Detection for Vehicles Based on Depth Camera

,


I. INTRODUCTION
The numbers of vehicles on urban roads have increased with the development of transportation infrastructure, and more accidents are therefore occurring, making vehicle safety an important issue. For obstacle detection, many researches have focused on robots and autonomous vehicles. Numerous successful research institutes such as Mobileye, Daimler, and KIT Autonomous Vision Group (AVG) have proposed preferable research approaches in this area.
Vehicle environmental obstacle detection is a hot issue in the field of intelligent vehicle research. It is a prerequisite for intelligent vehicle to realize automatic driving, autonomous navigation and other functions. Among the methods for vehicle environmental obstacles detection, lidar, The associate editor coordinating the review of this manuscript and approving it for publication was Donato Impedovo . millimeter wave radar(MMW radar) and ultrasonic radar have been widely used.
The lidar detects the scene in front of the vehicle by emitting infrared laser beam, and measures the time delay of the reflected light to calculate the distance between the obstacle and the vehicle. The lidar has high measurement accuracy and wide detection range, but it is sensitive to bad weather (such as heavy rain and fog) and heavy and expensive, and can interfere with the laser signals emitted by other vehicles.
MMW radar can measure distance by detecting reflected wave, and it can also achieve enough accuracy under harsh conditions. However, MMW radar has a narrow view field (usually has the field with milli-radians level)and low lateral information accuracy, which means that unless the vehicle is far ahead, the radar can not detect vehicles from adjacent lanes. VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ Ultrasound radar is also a common obstacle detection method. However, ultrasonic radar is not suitable for omnidirectional obstacle detection because of its poor long-range detection ability, vulnerability to noise interference, poor environmental perception accuracy and robustness.
Comparing with radar method, computer vision based method can accurately extract the three-dimensional contour of the target. It has the advantages of wide detection range, complete road information and remote sensing. It is a very important sensor in vehicle obstacle detection. However, as with other sensors, visual sensors are sensitive to weather and illumination conditions. At the same time, the calculation of disparity map and the analysis and reconstruction of threedimensional scenes have large computation, which limits the application of binocular stereo vision in omni-directional obstacle detection.
At present, the depth camera gradually plays an important role in obstacle detection. It has the advantages of large detection distance, rich depth information, not affected by the environmental light, low cost and so on. Therefore this study proposes an omni-directional obstacle detection method using depth cameras. After installing depth cameras at different locations around the vehicle and determining the ROI, the obstacle area within the ROI is extracted using the proposed region growth method for depth images. In addition, an improved iterative normalized cut method is also proposed to cluster and segment the obstacles. Finally, omni-directional obstacle viewing is realized using 3D reconstruction and an overview of the obstacle regions.

II. RELATED STUDIES
To prevent traffic accidents, numerous methods have used sensors such as ultrasound, cameras, or lidar to obtain a real-time perception of the surrounding environment. Among such sensors, ultrasound achieves a strong reflectivity, but its point-to-point ranging method, slow sound wave propagation speed, required planar surface of the reflector, and interference between ultrasound waves limit its application in omni-directional obstacle detection. At present, studies on obstacle detection have focused on computer-vision [1] and lidar [2]- [8] based methods.
In the case of computer-vision based methods, the approach in [9] segments the obstacle region by using a given judgment threshold to obtain the binary image of binocular depth data; however, this method depends on the selected threshold, and thus it cannot segment the complete obstacles, and noise will seriously affect the obstacle detection. The methods in [5] and [10]- [12] use the V-disparity algorithm to detect obstacles. In [12], the original UV-disparity algorithm is applied to detect obstacles in an urban environment. However, this method is more sensitive to large obstacles, and will lead to a massive loss in the detection of small obstacles. Obstacle detection using a binocular camera and requiring binocular matching, has also been commonly applied. In [13]- [15], preferable matching methods were proposed. The method in [13] obtains an accurate binocular matching by training a convolutional neural network to compare image blocks. However, binocular matching is a time-consuming process, and incorrect matching will lead to incorrect obstacle detection. The approaches in [16]- [21] use an optical flow method to detect obstacles through monocular vision. In [18], an obstacle ROI region is established according to the different characteristics of the optical flow between the obstacle region and the background. The method described in [19] distinguishes between different obstacles using a combination of the optical flow and dense disparity information. In [21], the features of KLT are used to track and detect obstacles. However, the optical flow is very complex and time-consuming, and the method is therefore unsuitable for real-time omni-directional obstacle detection. In [31], inertial measurement unit (IMU) and other synchronized sensors are used to detect obstacles. In [22], fewer RGB cameras are applied to realize omni-directional obstacle detection. However, the use of fewer cameras causes a serious fisheye distortion, and the method suffers from other shortcomings, such as the need for a large amount of camera calibration and an inability to work at night. In addition, computer-vision based method is extremely sensitive to light, and obstacle detection under dark lighting is even worse, which may result in the shadows of obstacles being detected as new obstacles.
As examples of lidar-based method, the methods in [4], [5], [28] and [29] use lidar to detect obstacles, where [4] processes point clouds as 2D obstacles using the iterative least squares method. In [8], a Gauss mixture model based on 3D lidar is applied for obstacle detection. However, using lidar for omni-directional obstacle detection is not only expensive it is also easily affected by rainwater and dust. Furthermore, lidar typically has a large volume and blind areas when detecting obstacles closer to the car body, and is thus poorly suited for omni-directional obstacle detection.
In addition, deep learning has been increasingly applied to image region segmentation, such as in [23], [24], [30]. Because a deep learning method requires a large amount of sample data for training, it is difficult to achieve the requirement of real-time detection. The methods in [25], [27] apply machine learning to detect obstacles. Although this method obtains a better effect, its real-time performance cannot reach the real-time requirement of an omni-directional obstacle detection system when the vehicle is moving. Table 1 summarized the characteristics of different methods mentioned above based on their different features. Besides the shortcoming of the computer-vision method in the table, it cannot work under low illumination environment. The method we proposed uses the depth camera to perceive the surrounding environment, which is not affected by environment lighting and the obstacle detection accuracy and speed are also greatly improved.
Although the depth camera has the advantages of large detection distance, rich depth information, not affected by the environmental light, low cost and so on, there are few studies regarding obstacle detection by depth camera. Obstacle detection by depth camera mainly uses binary threshold segmentation, multi frame background segmentation and clustering segmentation methods to extract obstacles, but the detection effect using these methods is not very well and the some extracted obstacles have incomplete shapes. To solve these problems, this paper proposes a new method using depth camera for omni-directional obstacle detection, which greatly improved omni-directional obstacle detection accuracy and speed in the vehicle environment.

III. SYSTEM OVERVIEW
The omni-directional obstacle detection method proposed in this paper can be used to obtain a depth image of an omni-directional environment through an evenly distributed installation of ten depth cameras placed on the car body. The installation positions are shown in Fig. 1 (a). Fig. 1 (b) shows the overall process used to extract omni-directional obstacles with this method. In order to ensure that the detection covers the whole omni-directional space, the view field of each depth camera is set to 36 degrees(this degree depends on the number of the depth camera and the type of depth camera. Ten depth cameras are used in this study), which ensures the coverage of omni-directional space. When detecting obstacles, all the depth cameras generate depth images at the same time. After all the cameras have generated the depth images, a panoramic depth image will be formed.
The processing steps required include collecting the original image of the scene from the depth cameras, which means obtaining depth images from the ten depth cameras; Note that the detection range of different depth cameras is different, the obstacle depth data within the detection range will be received normally, and the obstacle depth data outside the range will not be received. Therefore, the depth image generated by each depth camera within the detection range is the real and valid depth data without any data outside the detection range. So except for noise and holes, there are no other detection errors in the depth data. Determining the ROI region for the depth images obtained; and denoising the ROI regions using a fast adaptive median filter, which can effectively eliminate holes and pulse noises in the depth image and helps with the subsequent processing. After the denoising, as the exact position(including position and rotating angle) of the cameras in relation to the car coordinate system is known, the transformation matrix of each depth camera can be obtained, therefore a panoramic depth image can be generated by transforming the depth points in ten depth images into the car coordinate system using their own transformation matrices(Discussed in section III.F). A newly proposed method of region growth for depth image and a fast inpainting method are applied, which can extract connected obstacle regions and fast repair or fill in the holes in depth image. Meanwhile, Because the connected obstacle region will include a large number of small and fragmented obstacle regions, in this study an improved iterative normalized cut algorithm is proposed to cluster and segment all obstacles that extracted, and thus the small and fragmentary obstacles can be merged into the surrounding obstacles, which is conducive to the entire treatment of obstacles in the later steps. After clustering and segmenting using iterative normalized cut, an omni-directional obstacle distribution map is obtained by overlooking 3D point clouds in the depth images.

A. DETERMINATION OF THE ROI IN DEPTH IMAGE
Among the different image segmentation algorithms available, the region of interest (ROI) algorithm is very important for image processing. On the one hand, it indicates the image range that requires processing, and on the other, it improves the processing efficiency. For obstacle detection around a vehicle using the depth image, the ROI applied in this study determines the region through which the vehicle can pass, further reducing the region to be processed and thereby improving the efficiency.
For the 3D point cloud in a depth space captured by a depth camera, the vehicle attitude plane equation is defined to segment the space of the 3D point cloud, as shown in Fig. 2. The vehicle attitude plane is parallel to the vehicle chassis, and this plane determines the maximum height of the obstacles that the vehicle can pass. Fig. 2 (a) and (b) shows the status of the ROI region at different times while the vehicle is moving. The region below the vehicle attitude plane is the non-ROI region, where obstacle detection is not required. The region above the vehicle attitude plane is the ROI region, where the objects in this region are obstacles and need to be detected.   The vehicle attitude plane is defined as and the coordinates of a certain point in the point cloud of the depth space are indicated by (x 0 , y 0 , z 0 ). If then this point belongs to the ROI space; otherwise, it is in the non-ROI space.

B. DENOISING OF THE DEPTH IMAGE
For a TOF(Time of flight) camera, such as the depth camera, the depth camera sends the light pulse to the target continuously, then receives the light returned from the object with the sensor, and obtains the distance between the target and the object by detecting the flight (round-trip) time of the light pulse. Because the reflectivity of different surfaces differs, when the reflectivity is low, the amount of light returned from the incident light will be reduced, and thus there will be a smaller number of holes in the depth image, similar to a black surface absorbing a large amount of light and forming holes in the image. We assume that the light of each pixel is reflected from a single location; in practice, however, the light reflected from other locations may be reflected to the same pixel many times, which will cause an error. At the edge of a stereo object, namely, the foreground and background, the returned light is a mixture of the foreground and background, and noise will be generated during the process of distinguishing between the two. For the omni-directional obstacle detection requiring a real-time performance described in this paper, a fast adaptive median filter is used to remove noise.
The adaptive median filter first detects whether each pixel in the ROI space is a noise point. By providing a window with the initial size, the median value of the pixels in the neighborhood is calculated, and whether this value is between the maximum and minimum pixel value in the neighborhood is determined. If this condition is satisfied, then the point is not a noise point and will not be filtered by the median filter. Otherwise, the range of the neighborhood is enlarged, and this process is repeated. For those pixels that are consistently judged as noise points, the enlargement of the neighborhood will not stop until the neighborhood size reaches the setting threshold. Fig. 3 shows the denoising effect when using the adaptive median filter on a certain depth image. It can be seen that noises, such as holes in the depth image, have been effectively removed.

C. METHOD OF REGION GROWTH FOR DEPTH IMAGE
As described in the previous section, adaptive median filtering is used to remove the holes and other noises in the depth image, which makes the obstacles in the depth image easier to be extracted. Each pixel in the depth image has its own depth value, and thus, the entire depth image can be seen as a large number of 3D point clouds in space. Therefore, this study proposes a new region growth method that is suitable for the depth image, and preliminarily extracts obstacles in the depth image.
1. For each point in the depth image, ID(x, y, z) is defined as the number of the region, where the point is located, the initial value of which is zero, and x, y, and z indicate the 3D coordinates of this point. In addition, S(ID(x, y, z)) is defined as its area, the initial value of which is also zero.
2. The similarity threshold is defined as T s . Starting from the coordinates (0, 0, 0) in the depth image, the image points are traversed along the three-dimensional coordinate axis in turn. When a depth point P 0 is encountered, it is confirmed whether there is a depth point P 1 in the radius T s of P 0 . If P 1 exists, then P 0 and P 1 are merged into the same region. The coordinate of P 0 is set as (x 0 , y 0 , z 0 ), the number of the region where P 0 is located is set as ID(P 0 ) = ID(x 0 , y 0 , z 0 ) = 1 and the area of P 0 is set as S(ID(x 0 , y 0 , z 0 )) = 1. In addition, the coordinates of P 1 are set as (x 1 , y 1 , z 1 ) because P 1 and P 0 belong to the same region, and thus ID(P 1 ) = ID(P 0 ) is set, that is ID(x 1 , y 1 , z 1 ) = ID(x 0 , y 0 , z 0 ) = 1 and the area of the region increases on its own, that is S(ID(x 1 , y 1 , z 1 )) = S(ID(x 0 , y 0 , z 0 )) + 1 = 2.
3. Continuing to traverse the depth image. If the next depth point is P 1 , it is confirmed whether depth point P 1 exists within the radius T s of P 1 and if so then P 1 and P 1 are merged into the same region. If the coordinate of P 1 is set as (x 1 , y 1 , z 1 ), then ID(x 1 , y 1 , z 1 ) = ID(x 1 , y 1 , z 1 ) = 1 and the area self-increases, that is S(ID(x 1 , y 1 , z 1 )) = S(ID(x 1 , y 1 , z 1 )) + 1 = 3. If the next depth point is P 2 , it is confirmed whether depth point P 2 exists within the radius T s of P 2 . If the next depth point is P 3 , it is confirmed whether depth point P 3 exists within the radius T s of P 3 . The check continues until all depth points have been investigated. 4. At a certain time, if there are no depth points belonging to a certain region within the radius T s of P , then P is an isolated point or a start point of a new region, the number of regions self-increases, which means ID(P ) self-increases, and the area of the region is set as 1, that is S(P ) = 1.
5. Steps 3 and 4 are repeated until all depth points are traversed. Fig. 4 shows a process diagram of this method, in which a block represents an obstacle point. In (a), obstacle point p 0 is scanned sequentially because there are no points that belong to a region within the radius T s of p 0 , therefore p 0 is a starting point of a new region; thus, the number of regions that the point belongs to is ID(p 0 ) = 1 and the area of this point is S(ID(p 0 )) = 1. Then, obstacle points P 1 , P 2 , and P 3 are searched sequentially. For p 1 , because there are no points that belong to a region within the radius T s of P 1 , the number of the region self-increases, that is ID(p 1 ) = ID(p 0 ) + 1 = 2 and the area of this point is S(ID(p 1 )) = 1 which means a new region is found. The red arrow indicates that the next obstacle point searched is P 2 , and because P 1 exists within its radius, thus p 2 belongs to the region of p 1 , that is ID(p 2 ) = ID(p 1 ) = 2 and the area self-increases as S(ID(p 2 )) = S(ID(p 1 )) + 1 = 2.
For p 3 which is sequentially searched, its number is the same as p 2 and the area self-increases. As shown in Fig. 4 (b), the process continues with the traversing of obstacle points along with the coordinate axis. Because p 4 exists within the radius of p 0 , p 4 is merged into p 0 , and the area self-increases by the largest area of the current region. When p 5 is searched, because p 4 exists within the radius of p 1 p 5 is merged into p 1 , and the area self-increases by the largest area of the current region. Fig. 4 (c) shows the growing result of all obstacle points, and that there are two obstacle regions generated. The number next to the obstacle region indicates the number of the obstacle region, and the number in the box indicates the search order of region growth. In addition, Fig. 4 (d) shows the area of the obstacles. As indicated, the area of both two obstacles increases gradually, indicating that the regions are growing.
This method is described as the following pseudo code, where the input is the depth image from the depth camera and the output is a set of obstacles. If p i is a depth point and no depth point in the radius T s of p i Then 6 If p 1 is a depth point and distance (p 1 , p i ) < T s Then 10 End 15 End VOLUME 8, 2020  Each pixel point is traversed to check whether it is a depth point. If the pixel point is a depth point and there are no other depth points within the radius T s of p i , it is considered an isolated point or a start point of a new region, and the ID of this point then self-increases and its area is reset to 1. If the pixel point is a depth point and there does exist another depth point within the radius T s of p i , then these two depth points are merged, and the ID of each point is set to the same number, and the area of this region self-increases. Fig. 5 shows the obstacle regions extracted from a certain depth image using the region growth method. Different obstacles are represented by different colors, and the number next to or in the obstacle represents the number of the region. As can be seen from Fig. 5, the region growth method can effectively extract obstacles in the depth image.

D. FAST INPAINTING METHOD FOR DEPTH IMAGE BASED ON KNN AND SPATIAL REGION GROWTH
Although the denoising and filtering of the depth image can eliminate some of the noise and holes, large holes still exist, which will affect the obstacle extraction in a later step. Generative Adversarial Networks (GAN) are State Of the Technical Art in image inpainting, however this method takes a long processing time and it is more important to quickly repair the holes using the obstacles information directly from the method of region growth in the previous step, which could accelerate the process of repairing the holes in depth image.
As described in the previous section, the obstacle region in a depth image is roughly extracted using spatial region growth. Because a real-time performance is required for vehicle obstacle detection, a fast inpainting method for the depth image based on KNN(k-NearestNeighbor) and spatial region growth is proposed herein.
Because the infrared reflectivity of different objects differs and the occlusion between objects always occurs, there are numerous holes on the obstacle surface in the depth image. An inpainting of the depth image is applied to repair or fill in these holes. A spatial correlation is also analyzed according to the obstacle range obtained during the previous step, and the pixel that best matches the pixel in the hole is determined.
For the depth data, the depth value is defined as D(u, v), and the point with a depth value D(u, v) of zero is a hole. The depth image is then binarized, and the formula applied is as follows: The result of binarization is shown in Fig. 6 (b). The holes H () in the figure are shown in black, and the obstacle range R E extracted using the spatial region growth is indicated in green.
For each hole, the treatment of the morphological corrosion and expansion is first carried out to expand the range of the hole area, allowing the neighborhood of the holes to be included. The neighborhood is defined that it includes a hole as R ED (R ED ∈ R E ), the neighborhood range of the hole, which is defined as R NP , can be obtained by determining the difference between the region after the treatment of the corrosion expansion and the original area through the following formula: The neighborhood R NP is shown in the red region in Fig. 6 (d).
To fill in a hole, it is necessary to determine its depth. Because the depth changes uniformly within a neighborhood, in this study, a new method for calculating the depth of the holes based on the use of KNN is proposed. To estimate the depth of a hole close to the neighborhood, it is given the same value as the depth the of neighborhood itself. For a pixel A in a hole of a depth image, its eigenvector is defined as A ij (X i , X j ) and its estimated depth is D A .
Pixel A is selected as the center point, and the related attribute values of n 2 − 1 pixels in its n × n dimensional neighborhood (where n is an integer of larger than 2) are calculated. The reference pixel is marked as B r , its characteristic vector as B rij (X ri , X rj ), and its depth value as D Br , where r = 1, 2, . . . , n 2 − 1. Because the depth value changes evenly within the hole region and the depth value is related to the distance of the neighborhood points of the hole, the ratio of Euclidean distance d r is used to represent the correlation between the hole point and the neighborhood point, which is defined as the weight coefficient ω r and the sum of the weights in n × n dimensional neighborhood is 1, that is, The weights of each hole point and its neighborhood point can be calculated as follows: where the Euclidean distance d r is expressed as follows.
Here, x ri is the ith dimensional coordinate of the rth point, and x rj is the jth dimensional coordinate of the rth point. The smaller the Euclidean distance d r is between the two points, the larger the weight coefficient and the higher the similarity. The depth value D A of the pixel in a hole is calculated using the valid depth information of the neighboring points, which is expressed as follows: That is, The calculated depth value D A of the hole pixel is filled into the corresponding position of the hole to complete the inpainting of the depth image. The final inpainting result of a depth image with holes ( Fig. 6 (a)) is shown in Fig. 6 (e), which indicates that the holes are well repaired.

E. CLUSTERING SEGMENTATION USING ITERATIVE NORMALIZED CUT
In the previous section, different obstacles are obtained using the region growth method for the depth image. However, the method is easily affected by the similarity threshold T s and may result in the fragmentation of obstacles if this threshold is too small, which is not conducive to the overall extraction of obstacles. Herein, an improved iterative normalized cut is proposed to create the normalized segmentation and merging the fragmented obstacles.
For obstacles obtained using region growth for the depth image, an undirected graph with weights is constructed using the iterative normalized cut algorithm for these obstacles. Each node in the graph represents an obstacle, and the weights on the edges represent the approximate relationship between obstacles.
By establishing a normalized minimum cut on the established graph, the obstacles in the graph are segmented for the first time, and this segmentation is carried out iteratively until no new areas can be segmented. Finally, the clustering segmentation of all obstacles is completed.
An undirected graph G = (V , E) is constructed for obstacles obtained by region growth for the depth image, where V represents the obstacles extracted from the previous step, and E represents edges, which are the connections between obstacles. Assume that graph G can be divided into disjointed parts A and B (V = A ∪ B). One partition of graph G can be defined as follows:  where C w (A, B) denotes the weights between regions A and B, and w(i, j) denotes the weights between obstacles i and j. The weights are defined as the spatial distance between obstacles. Iterative normalized cut achieves the purpose of normalized segmentation by calculating the normalized weights. The calculation formula is as follows: Here, i∈A,j∈V w(i, j) and i∈B,j∈V w(i, j) represent the sum of the weights from A and B to all nodes in the graph, respectively. Therefore, the optimal segmentation of A and B can be achieved by finding the minimal value of NC w (A, B). The minimal value of NC w (A, B) can be solved by solving the eigenvalues and eigenvectors of the matrix. Let the number of obstacles be n, let x = (x 1 , x 2 , · · · , x n ), where x i = −1 denotes that B contains node i, and x i = 1 denotes that A contains node i. Let W be a symmetric matrix of n × n, whose element w ij is W (i, j) in (10) can thus be deduced as follows: where y is the eigenvector and t is the eigenvalue. Then, the location of the obstacle to be segmented is the eigenvector corresponding to the second smallest eigenvalue. Initially, the iterative normalized cut algorithm is used to cluster and segment all obstacles obtained through the region growth method for the depth image, and two segmentations A and B of the obstacles are obtained. Iterative normalized cut is then applied again for both sections A and B, and the separated obstacle continues to be iteratively clustered and segmented by the algorithm until it can no longer be segmented. Fig. 7 shows the process of the iterative clustering and segmentation of obstacles in two scenes using the iterative normalized cut algorithm after extraction using the region growth method for the depth image. Fig. 7 (a) shows the obstacles extracted, in which a total of ten obstacles are found in scene #2 and nine are found in scene #1. Fig. 7 (b) shows the results of clustering and segmentation applied for the first time, in which all obstacles are divided into sections A and B; the objects in both sections are, then clustered and segmented into new sections A and B, respectively.
For scene #1, Fig. 7 (c) shows the clustering and segmentation results of section A, and Fig. 7 (d) shows the results for section B. In Fig. 7 (c), section A is segmented into new sections A and B, and in Fig. 7 (d), section B is also segmented into new sections A and B. The final clustering and segmentation results are shown in Fig. 7 (e). The results indicate that the nine obstacles are clustered into one to four obstacles, which effectively reduces their fragmentation.
In scene #2, because there are no obstacles that can be segmented in section B, section A is clustered and segmented, the results of which are shown in Fig. 7 (c). In Fig. 7 (c), section A is segmented into new sections A and B. Iterative segmentation for part A is shown in Fig. 7 (d), and that for part B is shown in Fig. 7 (e). The final clustering and segmentation results are shown in Fig. 7 (f). As the results indicate, the ten obstacles are clustered into one to five obstacles, which also effectively reduces their fragmentation.

F. DEPTH IMAGE REGISTRATION AND VISUALIZATION OF OMNI-DIRECTIONAL OBSTACLES
As the exact position(including position and rotating angle) of the cameras in relation to the car coordinate system is known, the transformation matrix [T ] of each depth camera can be obtained, and the inverse transformation matrix [T ] could also be deducted from this matrix.
When detecting obstacles, all the depth cameras generate depth images at the same time. After all the cameras have generated the depth images, depth points in ten depth images are transformed into the car coordinate system using their own transformation matrices [T ], which completes the depth point cloud registration in order to generate the panoramic depth image.
After the obstacles in the panoramic depth image are extracted using our method mentioned above, the obstacles in the panoramic depth image are transformed according to the inverse transformation matrix [T ] deducted from transformation matrix [T ], in order to generate a omni-directional obstacle map. These two processes as shown in Fig.8.
3D visualization technology is used to show an omnidirectional scene by overviewing the obstacles. By rendering each depth point and constructing obstacle point clouds, the purpose of a real-time obstacle overview can be achieved.

IV. EXPERIMENT AND ANALYSIS A. OMNI-DIRECTIONAL OBSTACLE EXTRACTION
In this experiment, to verify the effectiveness of the proposed method, the effect of omni-directional obstacle detection under different scenarios is tested. Depth cameras are installed at different locations of the vehicle, as shown in Fig. 1 (a). The figure also shows that ten depth cameras are installed around the car body to collect depth information of the omni-directional environment. An on-board computer is used for the experiment and is equipped with an Intel i3-4130p processor with a main frequency of 3.4 GHz and 4G memory, Intel realsense D430 module is used as the depth camera.
The effect of omni-directional obstacle detection is tested under both daytime and nighttime environments, and the results are compared with current mainstream obstacle detection methods. Fig.9 shows the results of omni-directional obstacle detection using our method during the daytime. The depth image of the environment is obtained using depth cameras installed in ten locations, and the obstacles are then detected using our proposed method. The detected obstacles are represented by region using a different color in the original and depth images. Among these regions, red indicates the primary obstacle region, where the obstacle is nearest to the vehicle. As indicated, our method can extract obstacles more completely, and small and fragmentary obstacles (such as tree crowns and branches) can also be detected more completely. Obstacles are segmented coherently and reasonably as a whole. No detection errors in which one obstacle is segmented into different regions occur, or different regions of different obstacles are merged into new obstacles. A top view of the obstacles represents an overview of the extracted obstacle point cloud through a 3D visualization. The term ''Omni'' indicates the overview of all obstacle point clouds showing the obstacles that exist in an omni-directional environment. Fig. 10 shows the results of omni-directional obstacle detection using our method at night. Because the environment is purely dark without any interference of other lighting, the depth camera can work well. Our method is also used to detect obstacles in this type of environment, and detected obstacles are represented by regions with different colors in both the original and depth images. Among these regions, the red region is also the primary obstacle region, representing the obstacle that is nearest to the vehicle. The experiment results show that the obstacles detected by our method at night are clearer and more complete than those detected during the daytime, and the obstacles are also more reasonably segmented without any abnormalities. The top view image of the obstacles using 3D visualization shows the distribution of omni-directional obstacles in a dark environment.
At present, vision-based obstacle detection focuses on obstacle extraction using binocular vision. The binocular vision method usually uses the binary threshold segmentation, clustering analysis, UV-disparity map, and the optical flow to extract obstacles.
Binary threshold segmentation simply sets a distance threshold to segment the pixels in the binocular depth image according to the distance threshold, such as retaining the pixels within the distance threshold and removing the pixels outside the distance threshold. The clustering analysis method combines the pixels with the same attributes into the same region through a certain statistical method of pixel distance and density to achieve obstacle extraction. UV-disparity map determines the position of obstacles by counting the number of disparity on the u-map and v-map respectively and detecting the straight lines on the u-map and v-map.The optical flow method assigns a velocity vector to each pixel in the image, thus forming a motion vector field. The velocity vector formed by moving object is different from that of background, so the position of moving object can be calculated.
The detection results using the methods above are compared with those of our method. Because the vision-based method cannot work at night, the experiment was only carried out during the daytime. Fig. 11 (a) shows the experiment results of obstacle extraction using binary threshold segmentation. The experimental scene applied is the same as that shown in Fig. 9. The segmentation threshold is set as T = 1.0, which indicates that an obstacle can be considered to exist when the distance between the points in the binocular depth image and the camera (i.e., the vehicle) is less than 1 m. We can see from the segmentation result that all of the segmented obstacle are the depth points closest to the vehicle.
However, because only the threshold is used for segmentation, the segmented obstacles are incomplete and fragmented, which makes it impossible to analyze and achieve obstacle avoidance in a further step. Fig. 11 (b) shows the experiment results of the clustering segmentation. We use the K-means clustering algorithm to segment and extract obstacles from the binocular depth images. After the extraction of the obstacles, the purpose of extracting obstacles closer to the vehicles is achieved by retaining obstacles with smaller depth values. However, the K-means algorithm is limited by the value of K, and this clustering segmentation algorithm is more inclined toward the overall segmentation based on the spatial location; however, the obstacle segmentation of the depth image should be more inclined toward segmentation based on the connected region. Because the segmentation is based on the spatial location, we can see from Fig. 11 (b) that obstacles such as the tree crown are not extracted, and the trunk and other obstacles are clustered into an entire obstacle. Because the spatial location of the tree crown is far from the obstacles on the lower side, the K-means algorithm incorrectly divides the trunk and other obstacles on the lower side into the same obstacles. Fig. 11 (c) shows the result of extracting obstacles in the depth image using UV-disparity maps. The UV-disparity map determines the range of obstacles by detecting the lines in the map. However, because numerous fragmentary obstacles (such as trunks and leaves) exist in the test scene, it will be quite difficult to detect lines in the UV disparity map, and the detection error will be quite large. Because the UV-disparity map determines the obstacle range by detecting the line in the U and V maps, respectively, it can better extract regular obstacles. For complex and irregular obstacles such as leaves, trunks, and grasses in this experiment, the UV-disparity map can only roughly determine their locations and cannot accurately segment different obstacles; therefore, the false recognition rate is very high. In Fig. 11 (c), we detect lines that clearly exist. From the results, we can see that the overall range of the obstacles can be determined, although the obstacles cannot be accurately segmented, and the spatial overlapping obstacles cannot be distinguished. Therefore, the UV-disparity map is not suitable for obstacle detection in complex scenes. Fig. 11 (d) shows the obstacle detection result using the optical flow method. In addition, the yellow point in the image is the direction of the optical flow. Because the optical flow method first needs to extract image feature points, different lighting and color will affect the extraction of the feature points, and therefore corner and feature points with significant changes can be properly extracted, and the obstacles that belong to these points can also be well extracted. Therefore, we can see from the results in Fig. 11 (d) that parts of the obstacles are extracted, although the obstacles without distinct feature points such as trees and grasses are not well detected. Fig. 11 (e) shows the obstacle detection result using lidar and deep learning. In the training phase, as it is impossible to recognize all of the obstacles in the surrounding environment due to limited samples, only cars and trees with different shapes were trained for this experiment. However, as the samples trained are very limited, this method cannot detect all the obstacles and small obstacles are not completely detected.

B. COMPARISON OF EFFICIENCY AND ACCURACY OF OMNI-DIRECTIONAL OBSTACLE DETECTION
For omni-directional obstacle detection for vehicles, the detection efficiency will affect the driving safety. Therefore, we compare the efficiency of omni-directional obstacle detection with our method and the methods tested above. In this experiment, the frame rates of the depth camera and the RGB camera were both 30 fps. The vehicle drove for 5 minutes in an environment with complex obstacles and recorded 9,000 frames of depth and binocular images. The obstacles in each frame were detected using our proposed method and the binary threshold segmentation, K-means clustering analysis, UV-disparity map, and optical flow methods, respectively, and the total time required to complete the detection of 9,000 frames, and the maximum, minimum, and average times required to detect a single frame, were calculated. Table 2 shows the efficiency of the different methods. Fig. 12 shows the graph of the time required for omnidirectional obstacle detection for all frames and a single frame based on the table. Because K-means is a process of iterative convergence, it takes a long time to complete the detection. The UV-disparity map method needs to scan every pixel in the U map that corresponds to the column of the original depth map and analyze the number of statistical disparity values. Each pixel in the V map that corresponds to the row of the original depth map also needs to be scanned and the number of disparity values are then analyzed; therefore, the UV-disparity map method is also time-consuming. Binary threshold segmentation is in fact not an obstacle detection method. A partial depth image is preserved using a predefined threshold, and it is therefore meaningless to calculate the required time. The optical flow method takes a long time to process each frame because of its complexity and numerous computations required. The deep learning method also takes a long time because of its complexity. In contrast, our method has a faster processing speed and is suitable for real-time omni-directional obstacle detection.
For the accuracy of omni-directional obstacle detection, we collected a large number of omni-directional depth images and their corresponding binocular images in different environments with different obstacles, and detected the obstacles in these images using both our method and other methods mentioned above, and manually marked the locations of the major obstacles in each environment to establish a dataset. We used the Intersection Over Union (IOU) as the evaluation metrics for this dataset. When marking the locations of the major obstacles in each environment, the groundtruth boxes that cover the obstacles were also marked. The established data set consists of 5000 depth images and their corresponding binocular images(2500 depth images and 2500 corresponding binocular images), and these images include 9 scenes and 257 obstacles. This data set as shown in Table 3. We considered correctly detection(true positives), wrong detection(false positives) and undetected(false negatives) to calculate accuracy, precision and recall. When each obstacle was detected, we also use a box that can cover the obstacle, and then the area of the overlap of this box and the groundtruth box was calculated to compare the detection similarity. We define the ratio of this area to the area of the ground-truth box as the detection similarity, and we calculated the average similarity after all the images in data set were detected.
The experiment results are shown in Table 4, and Fig. 13 shows a graph of the accuracy, precision, recall rates and average similarity based on the table.
In Table 4, ''Correct detection'' indicates the number of the images that all of the obstacles in each image are correctly detected. Because binary threshold segmentation only preserves a portion of the disparity region, it does not have the ability to extract obstacles as a whole, and therefore this method is basically unable to achieve a proper obstacle detection. As this method segments the depth image though a set threshold, only the obstacle regions in the threshold can be detected, which causes most of the obstacles will not be detected and thus the accuracy and recall rate are very low. As this method can not really segment the obstacles, therefore most of the detected obstacles have incomplete shapes, which cause the detection similarity to the ground-truth box are also very low. The K-means clustering analysis can extract obstacles as a whole but is more inclined toward an overall segmentation according to the spatial location. However, the obstacle segmentation of the depth image is more inclined toward determining the connected regions of the obstacles. Therefore, K-means results in numerous obstacles being missegmented into different obstacles. Although obstacles are divided into different parts, most obstacles that marked in the data set were detected, and only a few wrong or incomplete detections exist. Therefore this method has a relatively high precision and recall rate. By contrast, the UV-disparity map detects obstacles by detecting lines in a UV graph. If there are numerous fragmentary and complex obstacles in the environment, the detection of the lines will be extremely difficult to achieve, resulting in a number of incorrectly detected obstacles, which lead to poor accuracy, precision and recall rate. Because UV method detects the lines to judge the obstacle, therefore detected obstacles usually have incomplete shapes, which will cause the area of the detected obstacles largely differ from the area of the ground-true boxes, and thus the detection similarity is also very low. Because the environmental light and color of the obstacle affect the extraction of the corner and feature points, the optical flow method cannot reliably extract all obstacles, and a few of the obstacles will not be completely extracted, which cause the accuracy, recall rate and average similarity are not very high. But for the obstacles that have the strong feature points, this method could well detect obstacles, so the precision rate can reach a high level. For deep learning, because only the sample of vehicle and tree were trained for the experiment, therefore not all the obstacles are correctly detected. But for the trained samples, almost all the obstacles are correctly and completely detected, therefore it has a high precision rate and high similarity to the ground-truth box. Because there are many other obstacles are not detected in the experiment environment due to limited samples, the accuracy and recall rate are relatively low. By contrast, first as our method could effectively extract small and irregular obstacles in complex scenes using depth image region growth, the number of detected obstacles are largely increased, which avoid small obstacles being missing and the shape of detected obstacles being incomplete, and thus our method could improve the accuracy and recall rate. Second, as a large number of fragmentary obstacles can be merged into the nearest obstacles through the proposed iterative normalized cut algorithm, it ensures that the detected obstacles could have a more complete shape and thus improves the precision rate and average similarity to the ground-truth box. All of these advantages make our method could more accurately and completely detect obstacles in all kind of complex environment, and more conducive to the subsequent overall treatment of the obstacles for analysis and avoidance.

V. DISCUSSION AND CONCLUSION
To solve the problem in which the obstacle detection around a vehicle is easily affected by the environment light and the material of the obstacle, this study proposes using a depth camera to obtain the depth image of the environment and extract obstacles from this image. A depth camera has a large detection coverage and can work at day or night, and is therefore more suitable for omni-directional obstacle detection than ultrasound and visual sensors.
In this study, the application of the ROI for a depth point cloud was proposed, which avoids collisions with obstacles by defining a plane parallel to the chassis, and the processing efficiency of the depth point cloud was promoted. The transformation matrix of each depth camera is used to generate the panoramic depth image. A fast inpainting method for depth image was also proposed, which effectively eliminates the noise and holes on the obstacle surface in the depth image. A region growth method for the depth image was also proposed. By analyzing the connected region in a depth image, the depth image is segmented, and obstacles are roughly extracted. An improved iterative normalized cut method was also proposed to cluster and segment all obstacle regions extracted using the region growth method for the depth image. The fragmented and small irregular obstacles are merged with the nearest obstacles in the spatial position to generate more complete obstacles, which is conducive to the subsequent overall treatment of the obstacles for analysis and avoidance. Finally, the inverse transformation matrix deducted from transformation matrix of their original depth camera is applied to generate a omni-directional distribution map for obstacles around the vehicle, thereby increasing driving safety. Based on an obstacle detection experiment conducted in a complex environment during both day and night, the effect of our detection method during the daytime was shown to be more effective than that of other mainstream methods in extracting complete obstacles, whereas for fragmentary and small obstacles such as branches, crowns, and leaves, our method can also achieve a complete extraction. Because the depth camera can work better in a dark environment, the detected obstacles have a more complete shape and clearer segmentation level at night, whereas the other mainstream methods cannot work at all at night.
Using a depth camera and a binocular camera to collect the depth and binocular images of the environment with complex obstacles, and applying different methods to detect obstacles in each image and calculate both the time required and the detection rate, the results indicate that our method has a faster detection speed than that of the other methods, as well as higher detection, precision, and recall rates. Each of these advantages shows that our method is more suitable for realtime omni-directional obstacle detection for use in vehicles.