I. Introduction
In the last decade, the robotic community has had a lot of breakthroughs and developments. One of the major domains that improved substantially is simultaneous localization and mapping (SLAM), which made real-world applications such as autonomous driving possible. Effective localization and mapping highly rely on robust place recognitions (PR) or loop closure detection (LCD) abilities, especially for longterm navigation tasks [1], [2]. This is one of the major challenges for current SLAM systems. Visual place recognition is a PR method that uses a camera for the task of matching two scenes. The problem of visual PR is challenging due to the fact that same scene appears differently under different season or weather conditions. Besides, the same scene place appears different from different viewpoints, which occurs very often during SLAM process because there is no guarantee that a robot will observe each local scene always from the same viewpoint. These are all challenges faced by a robot performing SLAM.