LiDAR-Based Multi-Task Road Perception Network for Autonomous Vehicles

For autonomous vehicles, it is an important requirement to obtain integrate static road information in real-time in dynamic driving environment. A comprehensive perception of the surrounding road should cover the accurate detection of the entire road area despite occlusion, the 3D geometry and the types of road topology in order to facilitate the practical applications in autonomous driving. To this end, we propose a lightweight and efﬁcient LiDAR-based multi-task road perception network (LMRoadNet) to conduct occlusion-free road segmentation, road ground height estimation, and road topology recognition simultaneously. To optimize the proposed network, a corresponding multi-task dataset, named MultiRoad, is built semi-automatically based on the public SemanticKITTI dataset. Speciﬁcally, our network architecture uses road segmentation as the main task, and the remaining two tasks are directly decoded on a concentrated 1/4 scale feature map derived from the main task’s feature maps of different scales and phases, which signiﬁcantly reduces the complexity of the overall network while achieves high performance. In addition, a loss function with learnable weight of each task is adopted to train the neural network, which effectively balances the loss of each task and improves performance of the individual tasks. Extensive experiments on the test set show that the proposed network achieves great performance of the three tasks in real-time, outperforms the conventional multi-task architecture and is comparable to the state-of-the-art efﬁcient methods. Finally, a fusion strategy is proposed to combine results on different directions to expand the ﬁeld of view for practical applications.

ABSTRACT For autonomous vehicles, it is an important requirement to obtain integrate static road information in real-time in dynamic driving environment. A comprehensive perception of the surrounding road should cover the accurate detection of the entire road area despite occlusion, the 3D geometry and the types of road topology in order to facilitate the practical applications in autonomous driving. To this end, we propose a lightweight and efficient LiDAR-based multi-task road perception network (LMRoadNet) to conduct occlusion-free road segmentation, road ground height estimation, and road topology recognition simultaneously. To optimize the proposed network, a corresponding multi-task dataset, named MultiRoad, is built semi-automatically based on the public SemanticKITTI dataset. Specifically, our network architecture uses road segmentation as the main task, and the remaining two tasks are directly decoded on a concentrated 1/4 scale feature map derived from the main task's feature maps of different scales and phases, which significantly reduces the complexity of the overall network while achieves high performance. In addition, a loss function with learnable weight of each task is adopted to train the neural network, which effectively balances the loss of each task and improves performance of the individual tasks. Extensive experiments on the test set show that the proposed network achieves great performance of the three tasks in realtime, outperforms the conventional multi-task architecture and is comparable to the state-of-the-art efficient methods. Finally, a fusion strategy is proposed to combine results on different directions to expand the field of view for practical applications.
INDEX TERMS Autonomous vehicles, dense height estimation, multi-task learning, occlusion-free road segmentation, road topology recognition, 3D LiDAR.

I. INTRODCTION
Understanding the layout and shape of the road near the ego vehicle is the basis of safe autonomous driving. A comprehensive perception of the surrounding road not only includes the accurate detection of road area, but also involves the global semantic information of road topology, such as the presence and type of an intersection, as it defines the scenario, provides context information and constrains the future motion of traffic participants. Moreover, to facilitate the practical applications in autonomous vehicles, 3D information of the The associate editor coordinating the review of this manuscript and approving it for publication was Aysegul Ucar . road is also necessary due to the fact that the ground is not always flat. The problem of road perception has been investigated for many years and a large variety of approaches can be found in the literature. As to road areas detection, free space road detection and road boundaries detection are two popular fields. The free space road detection methods [1]- [7] focus on the obstacle-free road region that vehicles could drive on, while its representation form somehow confuses the static road areas and on-road dynamic objects, which is not sufficient for planning in complex driving scenes. The road boundaries [8]- [14] are commonly represented by curves like Bezier Splines and Cubic Splines, which is not capable of describing the complex road shapes such as intersections. VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ On the other hand, the 3D information of the road is an important factor to control a vehicle. Some researchers [15], [16] pay attention to the lateral and longitudinal road slopes. Others [17], [18] are interested in reconstructing the road surface by various models. At the global level, the road topology recognition methods [19]- [23] aim to understand what type of road the ego vehicle is approaching. However, most of the literature mentioned above focus on only a single aspect of road perception, even if the single tasks are not time-cost, conducting all the necessary tasks for autonomous driving is time consuming.
In recent years, vision-based methods have been significantly benefit from deep learning. However, vison-based road detection suffers from ambiguity caused by the loss of 3D information in real autonomous driving applications. To tackle this, some methods [2], [5], [7], [22], [24] have been proposed to transform LiDAR point cloud to front-view or top-view representation, which could keep the 3D information while take advantage of DCNN technique. In [7], the LiDAR data are represented as a multi-channel 2D signal where the horizontal axis corresponds to the rotation angle and the vertical axis represents channels -laser beams for ground segmentation using a convolutional neural network. In [5], a deep learning approach has been developed to carry out road detection using only LiDAR data of top-view representation. In [24], to enhance the object detection with geometric and semantic priors, the authors firstly predict road height and segmentation in a top-view 3D occupancy grid of LiDAR point cloud. Although the above transformed representations are both able to increase the accuracy and efficiency, a top-view representation is, in our opinion, more appropriate than the front-view representation given that both path planning and vehicle control are easy to conduct on the top-view representation.
In addition, multi-task deep learning [10], [23], [25]- [27] has attracted the attention of researchers due to its potential to boost the performance of each individual task and improve the efficiency of the total network. In RBNet [10], a Bayesian model is implemented, and the RBNet can learn to estimate the road and the road boundary simultaneously. MultiNet [26] combines classification, detection and segmentation in a single architecture, which is based on ResNet [28] and consists of three shared encoding layers followed by task independent decoding layers. Reference [29] introduces an efficient approach for simultaneous object detection, depth estimation and pixel-level semantic segmentation using a shared convolutional architecture. Reference [27] proposes a unified neural network to detect drivable areas, lane lines, and traffic objects simultaneously, in which the three tasks are most important for autonomous driving. Besides the mechanism of parameters sharing, the most challenge problem in multitask learning is how to balance the loss weight of each task in training phase. Early works always use a weighted sum of the individual task losses, which may even decrease the performance when the loss weights are not properly selected. Recently, [30] proposes a novel and principled multi-task loss to simultaneously learn various classification and regression losses of varying quantities and units using homoscedastic task uncertainty, demonstrating the superiority and effectiveness of homoscedastic uncertainty for multi-task learning.
To cope with the challenge road perception problems mentioned above and obtain an integrated road areas detection despite occlusion, as well as 3D information and the high level understanding of the road types, we proposed a multi-task network performing occlusion-free road segmentation (ORS), dense road ground height estimation (DHE) and road topology recognition (RTR) simultaneously, as shown in Figure 1. The proposed multi-task network named LMRoad-Net, taking the top-view representation of LiDAR point cloud.
The main contributions of our work are as follows: 1) A lightweight and efficient multi-task road perception network is proposed to perform occlusion-free road segmentation, dense road height estimation and road topology recognition simultaneously in real-time, the architecture of which is elaborately designed to obtain a trade-off between accuracy and runtime.
2) To train and test the network, a multi-task road perception dataset denoted MultiRoad was built semi-automatically based on the public SemanticKITTI dataset.
3) A fusion strategy is proposed to combine the results of the proposed model on different directions, which could flexibly and efficiently expand the field of view depending on the complexity of the driving scenario and onboard computing power in real application.
The remainder of this paper is organized as follows. Section II describes the overview of our method for multitask road perception. Section III describes the procedure of building the multi-task dataset in detail. Section IV evaluates the proposed method through comprehensive experiments. Section V summarizes the contributions of this paper.

A. TASK DEFINITION
In our work, we perform three related and complementary tasks from single LiDAR input in road perception field. The first task is occlusion-free road segmentation (ORS), which aims to obtain full road area despite the occlusion. The second task is dense road height estimation (DHE), which estimates each road cell's height in the grid map. The third task intends to understand the global property of the road shape, named road topology recognition (RTR). The input is a grid-based top-view representation of the unstructured point cloud. In our work, 5 basic statistics are computed for each cell in the grid Grid where N i is the number of points, z max i , z max i , z mean i the minimum, mean, and maximum height of points, I i the mean reflectivity of points in i-th cell, NG the number of cells in the grid. The input of the network Grid ∈ R H ×W ×5 , the output of the ORS task RS ∈ R H ×W ×2 , the output of the DHE task DH ∈ R H ×W ×1 , the output of the RTR task RT ∈ R 7 , where H the height of the grid and W its width.

B. NETWORK ARCHITECTURE
With the purpose of efficiency and effectiveness in mind, the proposed network named LMRoadNet was designed to get the best possible trade-off between accuracy and runtime. The network jointly reasons about the ORS, DHE and RTR tasks. LMRoadNet can be trained end-to-end and joint inference over all tasks in less than 15ms. Unlike usual multi-task learning architecture that contains an encoder and some task-specific decoders, the proposed architecture decodes the other tasks on an intermediate scale feature map comes from a main task's feature maps of different scales and phases, which significantly reduces number of parameters and computational complexity while retains high accuracy. The proposed model is visualized in Figure 2.

1) OCCLUSION-FREE ROAD SEGMENTATION
In our work, the ORS task is chose as the main task due to two aspects. One is that the ORS task extracts more detailed features in the process. Another is that the other two tasks are highly related with and could be benefit from the ORS task. To make it more efficient to capture useful features, we use MixNet [31] as the backbone, whose basic component is mixed depth-wise convolution (MixConv) that naturally mixes up multiple kernel sizes in a single convolution, such that it can easily capture different patterns with various resolutions. To further reduce the compute complexity, only the fisrt 4 stage of the MixNet is used to get a 1/16 scale feature map.
To performing the ORS task based on the backbone, we proposed a Joint Up-sampling Module (JUM) that performing up-sampling on two feature maps from different stages. The feature map from earlier stage with bigger resolution and fewer channels carries sufficient details in spatial and the feature map from later stage with smaller resolution and more channels contains necessary facts in context. Thus the JUM takes advantage of both detailed and global information to infer not only the free road cells but also the occupancy and occluded road cells. As shown in Figure 2, the JUM up-samples the small scale feature map to that of the larger resolution features, followed by an 1 × 1 convolutional layer. The larger resolution feature map takes a 1 × 1 convolution before the two branch concatenated together. Then a Squeezeand-Excitation block (SE) [32] is used to adaptively recalibrate channel-wise feature responses by explicitly modelling interdependencies between channels, followed by two 3 × 3 convolutional layers. The JPU works in a cascading way to obtain 1/2 scale feature map, and finally a deconvolution layer is used to transform the features to the original resolution.

2) DENSE ROAD HEIGHT ESTIMATION
To efficiently utilize the features from the main task, the DHE task performs height estimation on the concatenated features come from a Feature Sharing Module (FSM), which merges features of different scales and stages of the main task and could provide suitable features for the other tasks. For that reason, only a proper simple decoder is able to obtain the accurate estimation of the height of each cell. As shown in VOLUME 8, 2020  before concatenation if they are not with the same resolution, for example, F2 is processed with up-sampling and 1 × 1 convolution, F4 with max-pooling. The DHE decoder firstly uses 1 × 1 convolutional layer to capture more task-specific useful features, and then up-samples the features to original resolution. Finally, two 1 × 1 convolutional layer with 32 and 1 channels respectively are applied to obtain the height estimation result.

3) ROAD TOPOLOGY RECOGNITION
The RTR task shares the same feature sharing module with the DHE task. It also takes a 1 × 1 convolutional layer to extract task-specific features firstly, an average pooling layer is adopted then, after which two 1 × 1 convolutional layers are used as fully convolutional layers to obtain prediction of the road topology types. The final output channel is 7, as to the number of classes of the road topology.

C. LOSS FUNCTION
For the ORS and RTR task, we use the cross-entropy loss for these pixel-level and image-level classification tasks, denoted L ors and L rtr respectively. For the DHE task, the L1 loss is applied on the road area, the non-road area are ignored in training phase, the loss is referred to L dhe .
where, L dhe is the loss of DHE task, N the number of valid cells belong to road area, p the prediction of height, q the ground truth height, i is the cell index in the grid. For multi-task learning, it is essential to appropriately balance the loss of each task to obtain a total loss when training the network jointly. Usually, the total loss is a weighted linear sum of the losses for each individual task, as (2), which is refer to fixed loss weight strategy in the paper. However, it is expensive and difficult to manually searching for an optimal weighting. Thus, instead of manually tuning the weight of each task to account for the differing variances and offsets amongst the single-task losses, we follow [30] to add the loss weight of each task to the learnable network parameters in our work, as (3). To avoid gradient exploding due to a potential division by zero, our network predicts s = log σ 2 instead of σ in practice, the total loss is defined as (4), which is applied in the learnable loss weight strategy. In practice, the loss weight of RTR task doesn't converge as the other two tasks, owing to that the RTR task is a global classification task that should be better to optimize later after the other tasks. However, the rapid change of the loss weight of the RTR task in the last training phase hurts the accuracy of the other tasks, so we fix the weights of the tasks in the last few epochs, which makes the training results more robust.
where, L i is the loss of i-th task, λ i its loss weight.
where, s i is the learnable parameter of each task to balance the losses of the individual tasks.

III. DATASET DESCRIPTION A. BUILDING DATASET
To training and test our algorithm, we built a multi-task dataset named MultiRoad based on the SemanticKITTI dataset [33], which provides dense semantic annotations for each individual LiDAR scan of sequences 00-10 of the odometry task of the large-scale KITTI Vision Benchmark [2]. It enables the usage of multiple sequential scans for semantic scene interpretation, like semantic segmentation and semantic scene completion. The labeled point clouds are recorded at a rate of 10 Hz in sequences. which enables the usage of temporal information for semantic scene understanding and aggregation of information over multiple scans. Instead of the unstructured point cloud, we transformed it to a grid-based top-view representation for the proposed multi-task learning method to utilize the powerful convolutional natural network in computer vision field. The first step of the procedure is to create a grid in the x-y plane of the LiDAR and to assign each element of the point cloud to one of its cells. The grid covers a region which is 30 meters wide, y ∈ [− 15,15], and 46 meters long, x ∈[0, 46]; its cells are squares of size 0.10 × 0.10 meters. Some basic statistics are then computed for each grid cell: number of points; mean reflectivity; minimum, mean, and maximum height of points in the cell. Finally, five images, one for each of the above statistics, are generated by viewing the grid cells as pixels. Given the chosen cell size and grid range, these top-view images have a resolution of 460 × 300 pixels.
We labeled the samples with an interval of 20 frames. There will be only a few empty cells containing none points in the grid when generated using the aggregated multiple scans. For the annotation of the ORS task, the course annotations were obtained from aggregated multiple scans around the selected scan frame. The cells containing points of road class are assigned to the road category and other cells to the none-road category. Then we manually refined the road segmentation annotation by complementing the empty road cells (road cells containing none points) and occupied road cells (road cells occluded by on-road objects).
For the annotation of DHE task, the key idea is to find the seed cells and derive the height value of other cells from its spatially neighboring seed cells. The seed cell is defined as there are high probability of none objects standing on the ground, whose height could be directly assigned to the mean height of the points in the cell. To find the seed cells, the feature of height difference H i and the density DS i of the cell is used to calculate the probability.
The height difference H i is defined as: where, z max i is the maximum height of the points in the cell and z min i the minimum height. The density of the cell is obtained by normalizing the number of points in the cell: where, N i is the number of points in the cell. The two features are transformed to probabilities of free road cell p 1 (C i ) and p 2 (C i ) by Gauss kernels (4).
where, H i is the height different of the cell, µ H its standard deviation, and σ 2 H its variance.
where, DS i is the density of the cell, µ DS its standard deviation and σ 2 DS its variance. Then they are fused by binary Bayes filter to obtain the final probability of the cell being a seed cell p (C i ).
where, odd −1 is inverse odds function.
In practice, the cell with probability higher than 0.7 is set to seed cell. The height of other cells is calculated by a weighted sum of the height of the surrounding seed cells.
where, p (C i ) is the probability of the cell being seed cell, the z mean the mean height of the points in the cell, NE i the set of surrounding seed cells of C i and α i,j the weight of each seed cell. Each surrounding seed cell's weight is measured in distance and probability in Gaussian metrics, as followed.
where, d i,j the distance between the target and seed cell, µ d its standard deviation and σ 2 d its variance, p C j is the probability of the cell being seed cell, µ p its standard deviation and σ 2 p its variance. VOLUME 8, 2020 Specially, we use a dynamic neighbor region for the nonseed cells, which could handle the sparsity of the non-empty cells in distant area. In details, we initialize the neighbor region radius ε as 5 cells, and expand the radius until the sum of the number of points of the seed cells in such neighbor exceed the defined threshold T n (4).
where, d i,j is the distance between the target and seed cell, ε the dynamic threshold of the region radius, which is calculated by: where, T ε is minimum the threshold of ε, T n the minimum threshold of the number of seed cells of its neighbor. The total procedure could refer to Algorithm 1. For the annotation of RTR task, we classified the common road layouts to 7 categories, straight road, left turn, right turn, left side road, right side road, T intersection and crossroad, as shown in Figure 3 (g). The samples are manually annotated with road topology ground truth labels in the view of region of 0 to 46 meters in front of the ego vehicle.
Finally, we got 912 train samples and 320 test samples. Each sample contains a grid-based LiDAR Representation, a road segmentation label, a dense road height label and a road topology label.

B. DATA PREPROCESS AND AUGMENTATION
In the training and test phase, the samples are preprocessed before feed to the network. The number of points N i is transformed to density feature by (14). The height values are normalized to [−1,1] by (15), thus the predicted height value in test phase should multiplied by 5 to obtain the real height value. In the training phase, the training data is augmented with random flipping left to right technique, which extends the training samples and reduces overfitting of the network.

IV. EXPERIMENTS AND RESULTS
In this section, we evaluate the effectiveness of the proposed LMRoadNet on the proposed multi-task dataset MultiRoad.
All the experiments are trained on the train set and evaluated on the test set. The results on the tasks of ORS, DHE and RTR are reported in both qualitative and quantitative aspects, the quantitative evaluation uses interpretable error metric for each task. We also report various ablation studies to shed light on the effects of the design decisions. In addition, we compare with other state-of-art efficient methods in the terms of accuracy and runtime. Finally, we show the strategy of extending

B. IMPLEMENTATION DETAILS
We perform all our experiments in PyTorch [34] with CUDA 10.0 and cuDNN backend. All experiments are run on a single NVIDIA GTX-1080Ti GPU. Due to GPU memory limitation, we have a maximum batch size of 4. During optimization, we use the SGD optimizer with a weight decay of 0.0001 and momentum of 0.9. The learning rate is set using poly strategy with start value of 0.01 and power of 0.9. The total epochs are set to 80. When performing multi-task training with learned loss weights of each task, the learned loss weights are fixed in the last 20 epochs to obtain a more robust training results. For fair comparison of runtime and accuracy with other state-of-art efficient models, we re-implement them in PyTorch. We average the forward passes cost time of total 320 test samples and transform the results to frame per second (FPS) as the runtime metric.

C. EXPERIMENTAL RESULTS OF THE PROPOSED LMROADNET ALGORITHM
The performance of the network was evaluated on the test set using metrics for each task as descripted above. A summary of the results is given in Table 1, which shows the singletask baseline in the first part and the multi-task results in the second part. As shown in Table 1, our method achieves significant gains over the single-task baseline, illustrating that the tasks are benefit from each other when trained jointly. When trained in duple-task mode, the performance of the ORS and DHE task are both improved as that they are more complementary tasks, as well as the ORS and RTR task. However, the DHE and RTR task don't follow this trend due to that the features they need are not related so much. When trained in triple mode, all the tasks obtain performance gains compared to single-task baselines, especially the RTR task improves 27.5% of mIoU. Qualitative results are shown in Figure 4, where we observe high quality results of the triple tasks on different situations. In the first row of the figure, red denotes false negatives, blue areas correspond to false positives and green represents true positives. The samples show that the road region is well detected in the ORS task even in occupancy road area that occluded by on-road objects (e.g., vehicles, pedestrians, etc.) and empty road area due to occlusion or sparsity of point cloud. The second row of the figure visualizes the results of the DHE task, in which different colors encode different heights in vehicle coordinate. The estimated height values are very smooth and consistent with actual road conditions. The last row in the figure are the predictions of road topology, showing that our method is capable of handling various driving scenes.

D. COMPARISON OF DIFFERENT LOSS WEIGHTS STRATEGIES
In this experiment, we trained the proposed LMRoadNet with different loss functions varies in terms of whether the loss VOLUME 8, 2020    weight of each task is learnable or fixed in training phase. As shown in Table 2, when trained with learnable loss weights strategy, performance of the proposed model improves 1.6% of ACC and 3.4% of F1 score in the ORS task, decreases 8.8 cm of L1 error in the DHE task and increases 13.3% of mIoU in the RTR task over the model trained with fixed 86760 VOLUME 8, 2020 loss weights. Moreover, referring to Table 1, in the fixed loss weights training mode, inappropriate loss weights may hurts the performance of the tasks even worse than that of singletask baselines.

E. COMPARISON OF DIFFERENT VARIANTS OF MULTI-TASK ARCHITECTURE
To evaluate the efficiency of our multi-task network architecture, we implement two variants of the network and compare their performance on the proposed MultiRoad dataset. The first variant shares a same encoder and three task-specific decoders as the conventional multi-task architecture, which is referred to base-1. The decoder of the DHE task is almost the same as the ORS task except the output channel. To obtain a larger receptive field for the RTR task, the branch of the task adopts the fifth phase of MixNet before the decoder. The other variant, referred to base-2, drops the last two feature maps in the decoder phase as to LMRoadNet. The results are shown in Table 3. Compared to base-1, the proposed LMRoadNet dramatically decreases the number of parameters and GFLOPs while significantly increase the performance of the tasks, indicating that our network architecture is superior to the conventional multi-task architecture. Base-2 saw a significant performance decrease of the DHE and RTR task, demonstrating that the last two feature maps of the main task are beneficial to the two tasks, which proves the effectiveness of our approach.

F. COMPARISON WITH OTHER STATE-OF-ART EFFICIENT METHODS
For fair comparison of accuracy and runtime with other stateof-art efficient models on the same tasks, we re-implement them in PyTorch, and train and evaluate them on the proposed multi-task dataset. In [5], a deep learning approach named LoDNN has been developed to carry out road detection using only LiDAR data that encoded several basic statistics in topview representation, which is very similar with our ORS task except that our method aims to detect full road area despite occlusion. Reference [24] shows that road priors (e.g., road segmentation and ground height) can boost the performance VOLUME 8, 2020 and robustness of modern 3D object detectors and propose a map prediction module that predict the road segmentation and estimate ground height separately with a same network, which are refer to UNet-seg and UNet-hei respectively in our paper. In order to take advantage of both LiDAR and camera sensors for lane detection in 3D space, [35] proposes a fast convolutional neural net (CNN) based on ResNet50 [28] to predict a dense ground height from the LiDAR input. As shown in Table 4, our model has more competitive performance on the ORS and DHE tasks considering the significant fewer parameters and GFLOPs than the baseline models.
With regard to the ORS task, our model outperforms UNet-seg by 0.3% of ACC and 0.6% of F1 score, and LoDNN by 0.5% of ACC and 0.9% of F1 score. The qualitative results of these models on the test set are shown in Figure 5, from which we can see that LMRoadNet detected road area more precisely both in overall and detailed aspect. Moreover, the performance of the models on different distance from the ego vehicle are calculated and shown in Table 5. The performance gaps between LMRoadNet and the baseline models become larger and larger as distance increases, indicating that the proposed LMRoadNet outperforms the others in the difficult distant area where the LiDAR points become more and more sparse.
As to the DHE task, although our model is a little lower in performance, it is competitive to the other models given that the parameters and GFLOPs of our model is dramatically smaller, as shown in Table 4. In addition, the models shares the same trend of the ORS task as the distance increases as can be seen from Figure 6, the proposed model achieves amazing less than 6 cm of L1 error in within 30 meters.

G. EXPANDING FIELD OF VIEW IN REAL APPLICATION
In practice, the mere view in front of the ego vehicle may not be sufficient for complex autonomous driving tasks. Therefore, we propose a fusion strategy to process multiple grids in different directions. Compared to directly processing a larger grid, this fusion strategy can not only effectively and flexibly expand the field of view, but also avoid processing some distant areas that require less attention. Specifically, it takes about 42ms when processing a 920 × 920 grid and only 26ms when processing four 460 × 300 grids in the front, back, left and right directions, gaining a speed boost of nearly 1.5×. The far corner areas contain vast empty cells that could be ignored in most of the time. Moreover, fusion the results from different predictions in the cross region may boost the performance. In the cross regions, for the ORS task, a binary Bayesian filter is applied to fuse predictions of different directions; for the DHE task, we use the average of estimates in each direction as the final result. Figure 7 show the combined detection results of some samples in two directions of the front and back and in four directions of front, back, left and right, respectively. It can be seen that our model performs well on different directions and obtain a larger field of view around the ego vehicle, which accurately represents the surrounding road both in semantic and geometry. Some unstructured road are well detected that shows the effectiveness of the proposed method. The field of view could be expanded depending on the computing power of the on-board processor and the complexity of the environment, which is highly flexibly and effective for practical autonomous driving applications.

V. CONCLUSIONS
In the work, we propose a dedicated multi-task algorithm named LMRoadNet for comprehensive road perception from a single LiDAR swap in real time. The tasks cover both semantic and geometry information of the road surrounding the ego vehicle, which are essential for autonomous driving. After jointly optimized on the proposed corresponding multitask dataset denoted MultiRoad using a combined loss function with learnable loss weights of each task, the LMRoadNet achieves 97.4% of accuracy and 94.2% of F1 score on the ORS task, 6.4 cm of L1 error on the DHE task and 84.1% of mIoU on the RTR task. The experiment results show that the LMRoadNet is capable of detecting the integrate road region despite occlusion, estimating the dense road height accurately and recognizing the type of the road in various driving scenes. In addition, substantial experiments demonstrating that our elaborated designed architecture outperforms the conventional multi-task architecture and is comparable to state-of-the-art efficient models. More importantly, the high effectiveness and real-time performance of the proposed model and the fusion strategy to efficiently and flexibly expand field of view make our method greatly applicable to practical autonomous driving.