Abstract:
In this traffic-scene-modeling study, we propose an image-captioning network which incorporates element attention into an encoder-decoder mechanism to generate more reaso...Show MoreMetadata
Abstract:
In this traffic-scene-modeling study, we propose an image-captioning network which incorporates element attention into an encoder-decoder mechanism to generate more reasonable scene captions. A visual-relationship-detecting network is also developed to detect the relative positions of object pairs. Firstly, the traffic scene elements are detected and segmented according to their clustered locations. Then, the image-captioning network is applied to generate the corresponding description of each traffic scene element. The visual-relationship-detecting network is utilized to detect the position relations of all object pairs in the subregion. The static and dynamic traffic elements are appropriately selected and organized to construct a 3D model according to the captions and the position relations. The reconstructed 3D traffic scenes can be utilized for the offline test of unmanned vehicles. The evaluations and comparisons based on the TSD-max, KITTI and Microsoft’s COCO datasets demonstrate the effectiveness of the proposed framework.
Published in: IEEE Transactions on Intelligent Transportation Systems ( Volume: 23, Issue: 7, July 2022)