CVML-Pose: Convolutional VAE Based Multi-Level Network for Object 3D Pose Estimation | IEEE Journals & Magazine | IEEE Xplore

CVML-Pose: Convolutional VAE Based Multi-Level Network for Object 3D Pose Estimation


0 seconds of 0 secondsVolume 90%
Press shift question mark to access a list of keyboard shortcuts
Keyboard Shortcuts
Play/PauseSPACE
Increase Volume
Decrease Volume
Seek Forward
Seek Backward
Captions On/Offc
Fullscreen/Exit Fullscreenf
Mute/Unmutem
Seek %0-9
00:00
00:00
00:00
 
This video demonstrates the proposed CVML-Pose method and contains two parts. The first part shows a real-time demonstration 3D pose estimation algorithm. The second part...

Abstract:

Most vision-based 3D pose estimation approaches typically rely on knowledge of object’s 3D model, depth measurements, and often require time-consuming iterative refinemen...Show More

Abstract:

Most vision-based 3D pose estimation approaches typically rely on knowledge of object’s 3D model, depth measurements, and often require time-consuming iterative refinement to improve accuracy. However, these can be seen as limiting factors for broader real-life applications. The main motivation for this paper is to address these limitations. To solve this, a novel Convolutional Variational Auto-Encoder based Multi-Level Network for object 3D pose estimation (CVML-Pose) method is proposed. Unlike most other methods, the proposed CVML-Pose implicitly learns an object’s 3D pose from only RGB images encoded in its latent space without knowing the object’s 3D model, depth information, or performing a post-refinement. CVML-Pose consists of two main modules: (i) CVML-AE representing convolutional variational autoencoder, whose role is to extract features from RGB images, (ii) Multi-Layer Perceptron and K-Nearest Neighbor regressors mapping the latent variables to object 3D pose including, respectively, rotation and translation. The proposed CVML-Pose has been evaluated on the LineMod and LineMod-Occlusion benchmark datasets. It has been shown to outperform other methods based on latent representations and achieves comparable results to the state-of-the-art, but without use of a 3D model or depth measurements. Utilizing the t-Distributed Stochastic Neighbor Embedding algorithm, the CVML-Pose latent space is shown to successfully represent objects’ category and topology. This opens up a prospect of integrated estimation of pose and other attributes (possibly also including surface finish or shape variations), which, with real-time processing due to the absence of iterative refinement, can facilitate various robotic applications. Code available: https://github.com/JZhao12/CVML-Pose.
0 seconds of 0 secondsVolume 90%
Press shift question mark to access a list of keyboard shortcuts
Keyboard Shortcuts
Play/PauseSPACE
Increase Volume
Decrease Volume
Seek Forward
Seek Backward
Captions On/Offc
Fullscreen/Exit Fullscreenf
Mute/Unmutem
Seek %0-9
00:00
00:00
00:00
 
This video demonstrates the proposed CVML-Pose method and contains two parts. The first part shows a real-time demonstration 3D pose estimation algorithm. The second part...
Published in: IEEE Access ( Volume: 11)
Page(s): 13830 - 13845
Date of Publication: 08 February 2023
Electronic ISSN: 2169-3536

Funding Agency:


References

References is not available for this document.