Deep Learning Tecniques for Urban Scene Parsing
Abstract:
Dense segmentation tasks, including semantic, instance, and panoptic segmentation, are essential for improving our comprehension of urban landscapes. This paper examines ...Show MoreMetadata
Abstract:
Dense segmentation tasks, including semantic, instance, and panoptic segmentation, are essential for improving our comprehension of urban landscapes. This paper examines various deep learning methodologies to enhance dense segmentation in urban scene parsing. Additionally, it delves into the challenges associated with adapting these deep learning models for practical, real-world applications and explores enhancements to address these challenges. Multiple datasets for urban scene parsing were analyzed, with the Cityscapes and Mapillary Vistas datasets identified as more suitable for urban scene parsing. Among them, the Mapillary Vistas dataset has greater diversity and complexity. Although Vision Transformers are computationally demanding, they exhibit superior performance in dense segmentation tasks. Prominent techniques that enhance the performance of Vision Transformers (ViTs), such as Encoder-Decoder architectures, Atrous convolutions, Atrous Spatial Pyramid Pooling (ASPP), dual decoders, and mask transformers are reviewed. Ensuring the performance of Vision Transformer models across various domains and fine-tuning them for practical, real-world applications present critical challenges in dense segmentation tasks for urban scene parsing. Among the various thematic attention approaches explored to ensure models concentrate on regions of interest, Masked Attention, and Visual Prompt Tuning mechanisms have shown superior results. Nonetheless, considerable potential remains for enhancing generalization across diverse urban datasets and improving overall efficiency.
Deep Learning Tecniques for Urban Scene Parsing
Published in: IEEE Access ( Volume: 13)