I. Introduction
A large amount of data is required to train using a Transformer [1] or Convolution Neural Networks (CNN) [2] based model, which is in the spotlight in the field of computer vision. In addition, to apply this model to the real world using this model, it is necessary to collect and construct data from the real environment. However, this work has a challenge because it takes a lot of time and cost. To solve this challenge, we study using virtual synthetic data, which is relatively inexpensive, and then study to improve the model that can operate in a real-world driving environment.