Action-ViT: Pedestrian Intent Prediction in Traffic Scenes | IEEE Journals & Magazine | IEEE Xplore

Action-ViT: Pedestrian Intent Prediction in Traffic Scenes


Abstract:

Pedestrian crossing intention prediction is crucial to traffic safety, which is a challenging task in real traffic scenarios. Traditional methods infer the intention of p...Show More

Abstract:

Pedestrian crossing intention prediction is crucial to traffic safety, which is a challenging task in real traffic scenarios. Traditional methods infer the intention of pedestrians to cross by predicting their future movements based on the observed trajectories in history. The performance of those methods is limited due to insufficient features and sources of information. To address those limitations, we propose a ViT-based model which incorporates multi-modal data to predict the pedestrian crossing intention. Specifically, the proposed model takes into consideration the visual information, poses, bounding box coordinates and action annotations, and gradually fuses those features for the final prediction. Besides, different data processing methods are designed based on the corresponding characteristics of different modalities to make full use of each type of data. Extensive ablation studies are conducted to show the performance of temporal modelling and feature fusion.
Published in: IEEE Signal Processing Letters ( Volume: 29)
Page(s): 324 - 328
Date of Publication: 10 December 2021

ISSN Information:

Funding Agency:


Contact IEEE to Subscribe

References

References is not available for this document.