1. Introduction
Monocular Visual Odometry (VO) methods for recovering ego-motion from a sequence of images have mostly been studied within a restricted scope, where a single dataset, such as KITTI [28], may be used for both training and evaluation under a fixed pre-calibrated camera [37], [45], [54], [77], [109], [112], [116], [124], [126], [128]. However, very few studies have analyzed the task of generalized VO, i.e., relative pose estimation with real-world scale across differing scenes and capture setups.