Map-Enhanced Ego-Lane Detection in the Missing Feature Scenarios

As one of the most important tasks in autonomous driving systems, ego-lane detection has been extensively studied and has achieved impressive results in many scenarios. However, ego-lane detection in the missing feature scenarios is still an unsolved problem. To address this problem, previous methods have been devoted to proposing more complicated feature extraction algorithms, but they are very time-consuming and cannot deal with extreme scenarios. Different from others, this paper exploits prior knowledge contained in digital maps, which has a strong capability to enhance the performance of detection algorithms. Specifically, we employ the road shape extracted from OpenStreetMap as lane model, which is highly consistent with the real lane shape and irrelevant to lane features. In this way, only a few lane features are needed to eliminate the position error between the road shape and the real lane, and a search-based optimization algorithm is proposed. Experiments show that the proposed method can be applied to various scenarios and can run in real-time at a frequency of 20 Hz. At the same time, we evaluated the proposed method on the public KITTI Lane dataset where it achieves state-of-the-art performance. Moreover, our code will be open source after publication.


I. INTRODUCTION
W ITH the development of artificial intelligence, autonomous driving systems have become research hotspots in both academia and industry. As one of the essential modules, ego-lane detection allows the car to properly position itself within the road lanes, which is crucial for subsequent control and planning.
Some typical ego-lane detection results in the KITTI Lane dataset are shown in Fig. 1, where the ego-lane is labeled as green. It can be seen that there are three main tasks for ego-lane detection: left boundary detection, right boundary detection and upper boundary detection. The upper boundary detection is mainly to detect the preceding vehicle, which has been studied by most scholars in recent years and has achieved encouraging results. Therefore, this paper focuses on the left and right boundary detection, that is, lane line detection and road curb detection in KITTI Lane dataset (the road in the KITTI Lane dataset is a two-way road and the vehicle is driving on the right lane).
For lane line detection and road curb detection, one of the most challenging scenarios is missing feature. Fig. 1 shows several typical examples of missing feature in the KITTI Lane dataset, including lane marking wear, lighting changes, and even no visible features. To tackle this challenge, previous methods [1], [2] have been devoted to proposing more effective feature extraction methods to obtain as many features as possible, but they are very time-consuming and cannot deal with extreme scenarios. In addition, model fitting plays an important role when features are partially missing or other objects are interpreted as features [3]. Therefore, this paper focuses on obtaining the compact high-level representation of lane boundaries through model fitting, thereby solving the missing feature problem.
In recent decades of research, various mathematical representation models have been used for model fitting, ranging from simple straight line models to complex spline models. Many researchers prefer model fitting using straight line [4]- [7], which is a good approximation for the short range and is the most common case in highway scenarios. Although the straight line model is efficient and simple, it will fail in curved roads, so some researchers propose to use a circular arc as lane model [8], [9]. Furthermore, quadratic polynomials [10], [11] and cubic polynomials [1], [12] are also widely used for model fitting in curved situations. In recent years, more and more researchers prefer to use splines for model fitting, including cubic spline [13], Catmull-Rom spline [14], B-Splines [15] and so on. Although mathematical representation models have been widely used for model fitting, their performance is profoundly affected by the quality of lane features. When in some extreme scenarios, a large randomness of the fitted parameters will occur, that is a large shape error between the fitted lane and the real lane.
Nowadays, most autonomous driving systems have access to digital maps that contain rich geometric and semantic information about the environment. This prior information has been proven to have a strong capability to enhance the performance of algorithms in perception [16], prediction [17], and motion planning [18]. In this paper, we exploit OpenStreetMap (OSM) [19], a free online community-driven map to enhance our egolane detection algorithm. OSM data is structured using three basic geometric elements: nodes, ways, and relations [20]. Ways are geometric objects like roads, railways, rivers, etc. It a collection of nodes, where the number of nodes is determined by the complexity of the object. Taking the road as an example, a straight road may consist of only two or three points as shown in Fig. 2(a), and a curved road may consist of dozens of points as shown in Fig. 2(b), ensuring the consistency of the OSM road shape and the real lane. Therefore, we use OSM road shape as lane model, which is irrelevant to lane features and robust to a variety of missing feature scenarios. However, the OSM data is provided by user contributions, so that it is coarse and rife with errors. At the same time, the localization system employed on the vehicle might be noisy. These two problems lead to a position error between the OSM data and the real lane. It can be seen from Fig. 2 that the projection result of the OSM data is close to the lane, so that the error is relatively small. Therefore, we use a search-based optimization method to minimize the distance between the OSM data and the extracted features, which can effectively improve the detection accuracy of the algorithm.
In this paper, we present a novel map-enhanced ego-lane detection framework to address the missing feature problem. Compared with other methods, we employ the OSM road shape as lane model, which is irrelevant to lane features. By minimizing the distance between the OSM road shape and extracted lane features, the position error is eliminated, thereby improving the accuracy of detection results. The main contributions of this paper are as follows: 1) Exploit the OSM road shape as lane model, which is highly consistent with the real lane shape and irrelevant to lane features. 2) Propose a search-based optimization method to eliminate the position error between the OSM data and the real lane, thereby improving the detection accuracy. 3) Propose an efficient ego-lane detection framework being able to run in real-time at a frequency of 20 Hz on a single CPU. The remainder of this paper is organized as follows. Section II presents the related work of ego-lane detection. In Section III, the proposed map-enhanced ego-lane detection framework is presented in detail. Experimental results are presented in Section IV. Finally, we conclude the paper in Section V.
II. RELATED WORK This paper focuses on solving the problem of missing feature by exploiting the OSM road shape as lane model. Therefore, the related work will be carried out in two aspects: lane modeling and map using.

A. Lane modeling
In recent years, lane modeling has played an important role in ego-lane detection, which refers to obtaining a math- ematical representation of road lane markings [21]. Different researchers have proposed different lane models. Some people only use simple straight lines, while others prefer to use more complex models, such as polynomial, clothoid, spline, and so on. The straight line model [4]- [7] is the most commonly used geometric model. It is a good approximation for short distances and is the most common model in highway scenes. To increase the robustness of model fitting, several constraints have been applied additionally, such as parallelism [22], [23], road or lane width [24], and so on. The straight line model is simple, but its applicability is limited, especially at long distances or curve road.
In [8], [9], curved roads are modeled in the bird's eye view using circular arc. Generally, the curvature of the road is small and continuous, so the circular arc is a conventional lane model on a ground plane [25]. However, circular arc cannot handle more general curved roads.
Since performing well on more general curved roads, polynomials are also widely used for model fitting, including quadratic polynomial [10], [11], cubic polynomial [1], [12] and so on. But the fitting effect at the connection between a straight lane and a circular curve is limited [3].
Several researchers [26], [27] assume that the shape of the road as clothoid, which is defined by the initial curvature, the constant curvature change rate, and its total length. Clothoid can be approximated by a third-order polynomial and used to avoid abrupt changes in steering angle when driving from straight to circular roads.
Splines are smooth piecewise polynomial curves, which have been popular in previous studies [28]. Spline based lane model describes a wider range of lane structures, as it can form arbitrary shapes by a different set of control points [29]. Various spline representations have been proposed for lane modeling. In [2], [13], a cubic spline with two to four control points is used for lane modeling. Wang et al. [14] presents lane modeling based on Catmull-Rom spline (also known as Overhauster spline), which is a local interpolating spline developed for computer graphics purposes. B-spline was introduced in [15], which can provide a local approximation of the contour with a small number of control points. Furthermore, nonuniform B-spline was used to construct the left and right lanes of the road [30]. Third-degree Bezier spline is also used to fit the left and right boundaries of the road surface [31]. The lane model was also improved to generate a B-snake model [32] or parallel-snake model [33].
Several combination models have also been proposed as lane models. In [34], [35], the image is divided into multiple slices, and lanes in each slice are fitted with straight lines to form a piecewise linear model. Jung et al. [36] proposed a linear parabolic lane model consisting of a linear function in the near-field and a parabola in the far-field. The nearby straight line model provides the robustness of the model, and the parabola provides the flexibility of the model. Similar to [36], the combination of a near-range straight-line model and a far-range clothoid model was proposed by [37].

B. Map using
A map that contains rich geometric and semantic information about the environment is essential for autonomous driving systems. Impressive results have been achieved by introducing maps to perception [16], prediction [17], and motion planning [18]. Various map-based methods are also proposed for egolane detection.
In [38], the curvature of the road was first obtained from the GPS position and the digital map, and then it was used to determine whether it was driving on a straight road or a curved road. Different road regions use different lane detection modules, of which straight roads are fitted using linear models and curved roads are fitted using circular arc.
To enhance the performance and robustness of the lane detection system, Möhler et al. [39] proposed to extract lane width and curvature of upcoming road segments from a digital map to adapt certain configuration parameters. In addition, clothoid is used for model fitting.
Döbert et al. [40] uses digital map as a guide for lane detection and has been applied in two aspects. One is to widen the map during feature extraction and project it onto the image to form a search area; The other is to project the geometry of the digital map onto the image during the tracking process, thereby defining a guide curve, and then resampling the measurements along the guide curve to estimate the new model. Similar to [39], the lane model is also a clothoid curve.
As described in Section I, all mathematical representation models have large parameter arbitrariness when features are missing. The methods that using maps still use mathematical representation models, and the problem still exists. In this paper, we use the road shape in OSM data as lane model and transform the fitting problem into a search-based optimization problem. The advantage is that the prior knowledge provided by the map is effectively used, and the problem of missing feature is addressed.

III. EGO-LANE DETECTION
In this section, our map-enhanced ego-lane detection framework will be described in detail. First, we describe the OSM data format and how to obtain the data needed for this paper. Next, we show the preprocessing step, which contains Region of Interest (ROI) selection and lane feature extraction. Finally, we explain how OSM data is used for ego-lane detection.

A. OpenStreetMap
In 2004, the OpenStreetMap project was started with the goal of creating a free to use and editable map of the world [19]. Different from commercial maps like Google, Navteq, and Teleatlas, OSM is created by volunteers in various ways, for example by supplying GPS tracks using portable GPS devices, labeling objects such as buildings in aerial imagery or by providing local information [41]. By the end of 2019, more than 6 million registered users had been contributed to the project, and more than 7 billion GPS track points had been submitted. The primary reason why we use OSM to assist egolane detection is that users can freely access and use under the Open Database License.
OSM data can be accessed via the corresponding website 1 in XML format, and users can download the map of an area of interest by specifying a bounding box. Fig. 3(a) shows the raw OSM data of a sample in the KITTI Lane dataset. OSM data is structured using three basic entities: nodes, ways, and relations. Nodes are geometric elements, which contain the GPS coordinates and a list of available tags. Ways are linearshaped or area-shaped geometric objects like roads, railways, rivers, etc. They are defined by reference to a list of ordered nodes. Relations are used to form more complicated structures with members of nodes and ways.
The OSM data is in the world coordinate system, but our ego-lane detection algorithm is performed in the vehicle coordinate system. Therefore, the OSM data needs to be transformed to the vehicle coordinate system first. Fig. 3(b) shows the results of our coordinate transformation result. It should be noted that the data beyond the image view is clipped. OSM data provides rich geometric information. However, for our purposes, the most useful information is the road that ego-car is traveling on, so other geometric information is also clipped. Finally, the OSM data containing only the currently traveling road after being transformed and clipped is shown in Fig. 3(c), which will be used as the lane model later.

B. Preprocessing
Before using the OSM road shape for ego-lane detection, lane line features and road curb features need to be extracted first. In order to improve the speed and accuracy of the algorithm, feature extraction is generally after ROI selection [3]. Therefore, we consider both ROI selection and lane feature extraction as preprocessing in this section.
1) ROI Selection: Among all tasks in ego-lane detection, ROI selection is usually the first step performed in most of the previous studies [28]. The main reason for focusing on ROI selection is to increase the computation efficiency and detection accuracy. In this paper, we consider the drivable area to be the ROI. It contains all lane markers and road curbs for feature extraction, and trees, buildings and other objects outside the road can be ignored. Therefore, ROI selection can be redefined as road detection.
Camera is a light-sensitive sensor that is easily affected by illumination and shadows. Although many deep learning methods have greatly improved the performance of image processing in recent years, what has to be considered is computational efficiency, so it is not suitable for the preprocessing step. Unlike the camera, 3D LiDAR is unaffected by illumination and can provide accurate geometric information about the environment. Therefore, we use 3D LiDAR for ROI selection.
To meet the real-time requirements, we project the 3D point cloud data to a 2D range image, which can achieve data compression while retaining neighborhood information.
The number of rows of the range image is defined by the number of laser beams of the 3D LiDAR. The KITTI dataset uses Velodyne HDL-64E, so the number of rows is 64. The number of columns of the range image is the horizontal resolution of the 3D LiDAR. We only use 90 • field of view that coincides with the camera, so the number of columns is 500. In summary, the size of the range image is 64 × 500, and an example of a range image can be seen in Fig. 4(a).
Based on the assumption that the road is flat and continuous, we do road detection on the range image using the region grow method. As the vehicle is traveling in the forward direction, the road is always located in front of the vehicle. Therefore, seed points are selected as points in front of the vehicle, which are located in the bottom center of the range image. The similarity between pixels is defined by the horizontal slope feature and the vertical slope feature.
For each pixel, the horizontal slope feature is calculated based on k neighborhood points in the same laser beam: where (x i , y i ) is the position in the 3D LiDAR coordinate system of the pixel, andX,Ȳ are the average value of the k neighbors. As shown in Fig. 5(a), the feature value α A on the ground is close to 0, while the feature value α B on the road curb is close to infinity, so the horizontal slope feature is used to detect the road curb. At the same time, the features were normalized using the logistic function, and the results are shown in Fig. 4(b).
For each pixel, the vertical slope feature is calculated based on the points on two adjacent laser beams in the same ray direction: where (d r , z r ) is a point on the r laser beam, and (d r+1 , z r+1 ) is a point on the r +1 laser beam, d = x r 2 + y r 2 . As shown in Fig. 5(b), the feature value β AB on the ground is close to 0, while the feature value β BC on the obstacle is close to infinity, so the vertical slope feature is used to detect obstacles. At the same time, the features were normalized using the logistic function, and the results are shown in Fig. 4(c). After getting the two slope features, the weighted sum is finally calculated: where a and b are coefficients of horizontal slope feature and vertical slope feature respectively. Fig. 4(d) shows the weighted sum feature map and it can be seen that obstacles and road curbs are all detected. After obtaining the weighted sum feature map, we use horizontal and vertical region grow to obtain the road area, and the results are shown in Fig.  4(e). Finally, we project road points onto the perspective image and use Delaunay Triangulation [42] to upsampling the sparse point cloud to obtain the ROI selection result. The ROI selection result is shown in Fig. 4(f).
2) Lane Feature Extraction: As described in Section I, lane features in the KITTI Lane dataset are mainly composed of two parts: lane line features and road curb features. In ROI selection, the horizontal slope feature has a good effect on detecting road curbs, so we directly use ROI selection results as road curb features.
Lane line feature extraction aims to extract low-level features from images to support ego-lane detection, such as color, texture, edges [43]. Among them, edges are the most common feature used in ego-lane detection for structured roads [25]. An edge is mathematically defined by the gradient of the intensity function [44], so we define the gradient as: where I is the image and G is the calculated gradient.
In the real driving scenario, lane line may not be parallel to the ego-car, so we use a convolution with the height of 1, which can detect the inclined lane line more stably. Compared with the perspective image that the lane line width becomes smaller as the distance increases, we perform feature extraction on the bird's eye view image, so that the lane line width is constant and easy to detect. We found that the lane line width generally takes 3 pixels on the bird's-eye view, so we use a convolution with the width of 9. There is a sharp contrast between the road surface and painted lane lines, so the 3 elements in the middle of the convolution kernel are 2 and the others are −1. In this way, when there is no lane, the intensity values between pixels are similar, and the gradient is 0; when there is a lane line, the intensity of the three elements in the middle is high, the intensity of the two sides is low, and the gradient is relatively large.
Therefore, when the gradient G(i, j) is greater than G th , the pixel at the (i, j) position is marked as the lane. An example of the lane line feature extraction result can be seen in Fig.  6(c). It should be noted that lane line feature extraction is performed on a gray-scale image (shown in Fig. 6(a)), and pixels outside the ROI region are not considered (shown in Fig. 6(b)).

C. Ego-lane Detection
The main goal of this stage is to extract a compact high-level representation of the lane that can be used for decision making [43]. In most papers, mathematical representation models are used as compact high-level representations such as straight lines, parabolas, splines, and so on. In order to fit lane features to these mathematical representation models, Least Squares Method (LSM) and Random Sample Consensus (RANSAC) are widely used. Since mathematical representation models have a large randomness of the fitted parameters when features are missing, we exploit OSM data to enhance ego-lane detection.
As mentioned in the previous section, OSM data is provided by the volunteers, so it is very coarse and rife with errors, which is called OSM data error. At the same time, when projecting the OSM data onto the image, the approximate vehicle pose estimation causes errors in the relative position of the OSM data with respect to the vehicle, which is called vehicle positioning error.
Since we perform ego-lane detection on the 2D image plane, the errors can be eliminated by the rotation parameter θ and the translation parameter x, y (the x-axis points to the vehicle's forward direction, while the y-axis is orthogonal to the xaxis and points the left of the vehicle). In real urban scenes, the radius of curvature of the road is relatively large, so the translation parameter x can be ignored. Therefore, we only need to consider the parameters y and θ, which represent the lateral offset and the heading offset, respectively. It should be noted that we do lane line detection and road curb detection simultaneously, so the lateral offset y consists of two parts: lane line lateral offset y l and road curb lateral offset y r .
To estimate these three parameters, we minimize the distance from the detected lane features to the OSM data. Since the OSM road shape consists of a series of points and their connections, the distance from the feature point to the OSM data is equal to the distance from the feature point to its nearest connection: where (x P i , y P i ) is the i-th feature point. (x Q j , y Q j ) and (x Q k , y Q k ) are the two adjacent OSM points closest to the feature point. Therefore, the optimization function is: where m is the number of feature points. y max is the maximum value of lateral offset, and θ max is the maximum value of heading offset. The above optimization problem turns out to be very difficult to solve due to looking for the OSM line closest to the feature point. Therefore, we rely on a search-based algorithm to find the optimal approximate solutions. The basic idea is that we iterate through all possible values of these three parameters. After iterating all parameters and obtaining all corresponding distances, we look for the optimal parameters that achieve the smallest distance. However, the time complexity of looping through these three parameters is O(N 3 ), which is very time consuming and cannot meet the real-time requirements. Therefore, we optimize these three parameters separately, so that the time complexity is reduced to O(3N ).

Algorithm 1 Search-Based Parameters Optimization
Input: feature points P ∈ R m×2 , OSM points Q ∈ R n×2 Output: optimization parameter γ * 1: d min ← +∞ 2: γ * ← 0 3: for γ = −γ max to γ max step δ γ do 4: for p in P do 6: for q in Q do descending order 7: transform q to q 8:  m features points P and n OSM points Q. The outputs from this algorithm are the optimization parameter γ * . As we optimize these three parameters separately, γ represents the heading offset parameter θ, lane line lateral offset y l and road curb lateral offset y r in each step of optimization. In line 3, all possible values are traversed by given the maximum value of these three parameters. From line 4 to line 13, the distance from the feature point to the OSM data is calculated. The optimal parameters that achieves the smallest distance is selected in line 14 to line 17.
After obtaining the optimization results of the left and right boundaries, we use the vertical slope feature (mentioned in the ROI selection subsection) to detect all obstacles between two boundaries, and take the point closest to the origin as the upper boundary. In this way, the result of ego-lane detection is the area surrounded by these three boundaries. Fig. 7 shows the ego-lane detection results of the scenarios corresponding to Fig. 2. In (a), the significant lateral error is completely eliminated, and the OSM road shape perfectly coincides with the lane boundaries. It can be seen from (b) that the significant heading error is completely eliminated, except for some slight errors between the OSM road shape and the lane boundaries shape.

IV. EXPERIMENTAL EVALUATION
In order to evaluate the accuracy and real-time performance of our algorithm, we test it on the public KITTI Lane benchmark. All algorithms are implemented in C++, PCL (Point Cloud Library) and OpenCV (Open Source Computer Vision Library), running on a laptop computer with an Intel i5-8265U 1.66 GHz CPU with 8 GB main memory.
subsectionExperimental Setup 1) KITTI Lane Benchmark: The KITTI Lane benchmark [45] is a widely used benchmark for ego-lane detection. 95 training samples and 96 testing samples are collected in various urban scenes with marked lanes were included. The evaluation metrics include maximum F1-measure (MaxF), average precision (AP), precision (PRE), recall (REC), false positive rate (FPR), and false negative rate (FNR), where MaxF is used as the primary metric value for comparison between different methods.
2) Experiments Setting: For ROI selection, the weighting coefficient of the horizontal slope feature a is 0.5, and the weighting coefficient of the vertical slope feature b is 0.5.
For lane feature extraction, the gradient threshold G th is 200.
For ego-lane detection, the maximum lateral error y max is 100 pixels and the step size δ y is 5 pixels; the maximum heading error θ max is 0.1 radians and the step size δ θ is 0.005 radians.

A. Performance Evaluation
We tested our method on the KITTI Lane benchmark and compared it with other state-of-the-art methods, including NVLaneNet, RoadNet3, MANLDF, RBNet [46], Up-Conv-Poly [47], SPRAY [48], SPlane + BL, DH-OCR [49] and SCRFFPFHGSP [50]. All results are evaluated on the KITTI evaluation server 2 , and the performance of the algorithms is shown in Table I.
The results show that the proposed method achieved 93.56% in the MaxF score, which is 1.70% higher than the previous state-of-the-art method. The improvement of the MaxF score is mainly due to the fact that our PRE can reach 95.94%, and this is precisely because we use OSM road shape as the lane model, which can accurately detect lane boundaries and further achieve higher accuracy.

B. Robustness to Missing Feature
In order to validate the robustness of the proposed algorithm to the missing feature problem, we perform model comparison experiments on the training dataset. Contrast mathematical representation models include straight line, circular arc, quadratic polynomial, and cubic spline. In order to increase the persuasiveness of the experiment, we down-sampled the features with the sampling ratio from 0 % to 100 %. It should be noted that we use the ground truth as lane features, which can avoid the interference of noise and thus ensure the fairness  of the experiment. The evaluation metric uses MaxF, and the experimental results are shown in Fig. 8. It can be seen that the fitting results of all mathematical representation models become worse as the number of features decreases. However, since the OSM road shape is used as the lane model, our method is very robust to missing features. Even if the number of features decreases, the effect remains basically unchanged. At the same time, in some extreme scenarios, such as no visible features (Fig. 1(c)), we directly use OSM road shape as the lane boundary, and the MaxF can reach 88.23 %, while other mathematical representation models cannot handle this scenario.

C. Runtime
Since our algorithm is to be used on autonomous driving systems, the less runtime of the algorithm allows systems to get information about the surrounding environment earlier, thereby ensuring the safety of the systems. As shown in Fig.  ??, the runtime of our algorithm on both training and testing datasets averages around 50 ms. This is twice as fast as the rotation rate of the 3D LIDAR, so our algorithm can be used safely on autonomous driving systems.

D. Qualitative Results
Some detection results of our method in perspective view and bird's eye view of the image are shown in Fig. 10 and Fig.  11, respectively. It can be seen that our method is very robust to missing feature caused by lane marking wear, lighting changes, no visible features, etc.

V. CONCLUSION
In this study, we employ the OSM road shape as lane model to enhance our ego-lane detection algorithm, which is robust to the challenging scenarios of missing feature. At the same time, to eliminate the position error between the OSM data and the real lane, a search-based optimization algorithm is proposed to improve the accuracy of the algorithm. We validate the algorithm on well-known KITTI Lane benchmark, which achieved state-of-the-art performance in terms of accuracy and real-time performance. In future work, in order to obtain more accurate ego-lane detection results, the OSM road shape error should also be eliminated.