I. INTRODUCTION
The production of high-precision maps [1], [2] relies on lidar sensors and a large number of manual annotations, suffering the problems of high cost and complex post-processing [3]. Therefore, it has become a consensus to use vision-based BEV perception technology for map element detection to reduce reliance on high-precision maps. Most of the existing monocular BEV map segmentation algorithms follow the process of image features encoding, transferring image features to BEV view, and decoding BEV features [4]–[6]. The focus of this research is to improve the performance of the View Transformation Module in the monocular map segmentation algorithm.