Loading web-font TeX/Main/Regular
HandVoxNet++: 3D Hand Shape and Pose Estimation Using Voxel-Based Neural Networks | IEEE Journals & Magazine | IEEE Xplore

HandVoxNet++: 3D Hand Shape and Pose Estimation Using Voxel-Based Neural Networks


Abstract:

3D hand shape and pose estimation from a single depth map is a new and challenging computer vision problem with many applications. Existing methods addressing it directly...Show More

Abstract:

3D hand shape and pose estimation from a single depth map is a new and challenging computer vision problem with many applications. Existing methods addressing it directly regress hand meshes via 2D convolutional neural networks, which leads to artifacts due to perspective distortions in the images. To address the limitations of the existing methods, we develop HandVoxNet++, i.e., a voxel-based deep network with 3D and graph convolutions trained in a fully supervised manner. The input to our network is a 3D voxelized-depth-map-based on the truncated signed distance function (TSDF). HandVoxNet++ relies on two hand shape representations. The first one is the 3D voxelized grid of hand shape, which does not preserve the mesh topology and which is the most accurate representation. The second representation is the hand surface that preserves the mesh topology. We combine the advantages of both representations by aligning the hand surface to the voxelized hand shape either with a new neural Graph-Convolutions-based Mesh Registration (GCN-MeshReg) or classical segment-wise Non-Rigid Gravitational Approach (NRGA++) which does not rely on training data. In extensive evaluations on three public benchmarks, i.e., SynHand5M, depth-based HANDS19 challenge and HO-3D, the proposed HandVoxNet++ achieves the state-of-the-art performance. In this journal extension of our previous approach presented at CVPR 2020, we gain 41.09\% and 13.7\% higher shape alignment accuracy on SynHand5M and HANDS19 datasets, respectively. Our method is ranked first on the HANDS19 challenge dataset (Task 1: Depth-Based 3D Hand Pose Estimation) at the moment of the submission of our results to the portal in August 2020.
Published in: IEEE Transactions on Pattern Analysis and Machine Intelligence ( Volume: 44, Issue: 12, 01 December 2022)
Page(s): 8962 - 8974
Date of Publication: 02 November 2021

ISSN Information:

PubMed ID: 34727024

Funding Agency:

References is not available for this document.

Select All
1.
J. S. Supančič, G. Rogez, Y. Yang, J. Shotton, and D. Ramanan, “Depth-based hand pose estimation: Methods, data, and challenges,” Int. J. Comput. Vis., vol. 126, no. 11, pp. 1180–1198, 2018.
2.
S. Yuan, “Depth-based 3D hand pose estimation: From current achievements to future goals,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2018, pp. 2636–2645.
3.
F. Mueller, “Ganerated hands for real-time 3D hand tracking from monocular RGB,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 49–59.
4.
J. Malik, A. Elhayek, S. Guha, S. Ahmed, A. Gillani, and D. Stricker, “DeepAirSig: End-to-end deep learning based in-air signature verification,” IEEE Access, vol. 8, pp. 195 832–195 843, 2020.
5.
F. Xiong, “A2J: Anchor-to-joint regression network for 3D articulated pose estimation from a single depth image,” in Proc. IEEE/CVF Int. Conf. Comput. Vis., 2019, pp. 793–802.
6.
G. Moon, J. Y. Chang, and K. M. Lee, “V2V-PoseNet: Voxel-to-voxel prediction network for accurate 3D hand and human pose estimation from a single depth map,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2018, pp. 5079–5088.
7.
M. Rad, M. Oberweger, and V. Lepetit, “Feature mapping for learning fast and accurate 3D pose inference from synthetic images,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2018, pp. 4663–4672.
8.
J. Taylor, “Efficient and precise interactive hand tracking through joint, continuous optimization of pose and correspondences,” ACM Trans. Graphics, vol. 35, no. 4, 2016, Art. no. 143.
9.
J. Romero, D. Tzionas, and M. J. Black, “Embodied hands: Modeling and capturing hands and bodies together,” ACM Trans. Graphics, vol. 36, no. 6, 2017, Art. no. 245.
10.
J. Malik, A. Elhayek, and D. Stricker, “WHSP-Net: A weakly-supervised approach for 3D hand shape and pose recovery from a single depth image,” Sensors, vol. 19, no. 17, 2019, Art. no. 3784.
11.
N. Qian, J. Wang, F. Mueller, F. Bernard, V. Golyanik, and C. Theobalt, “HTML: A parametric hand texture model for 3D hand reconstruction and personalization,” in Proc. Eur. Conf. Comput. Vis., 2020, pp. 54–71.
12.
X. Zhang, Q. Li, H. Mo, W. Zhang, and W. Zheng, “End-to-end hand mesh recovery from a monocular RGB image,” in Proc. IEEE/CVF Int. Conf. Comput. Vis., 2019, pp. 2354–2364.
13.
F. Mueller, “Real-time pose and shape reconstruction of two interacting hands with a single depth camera,” ACM Trans. Graphics, vol. 38, no. 4, 2019, Art. no. 49.
14.
L. Ge, “3D hand shape and pose estimation from a single RGB image,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2019, pp. 10825–10834.
15.
J. Malik, A. Elhayek, F. Nunnari, and D. Stricker, “Simple and effective deep hand shape and pose regression from a single depth image,” Comput. Graphics, vol. 85, pp. 85–91, 2019.
16.
J. Malik, “DeepHPS: End-to-end estimation of 3D hand pose and shape by learning from synthetic depth,” in Proc. Int. Conf. 3D Vis., 2018, pp. 110–119.
17.
J. Malik, “HandVoxNet: Deep voxel-based network for 3D hand shape and pose estimation from a single depth map,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2020, pp. 7111–7120.
18.
S. Shimada, V. Golyanik, E. Tretschk, D. Stricker, and C. Theobalt, “DispVoxNets: Non-rigid point set alignment with supervised learning proxies,” in Proc. Int. Conf. 3D Vis., 2019, pp. 27–36.
19.
S. A. Ali, V. Golyanik, and D. Stricker, “NRGA: Gravitational approach for non-rigid point set registration,” in Proc. Int. Conf. 3D Vis., 2018, pp. 756–765.
20.
S. Hampali, M. Rad, M. Oberweger, and V. Lepetit, “HOnnotate: A method for 3D annotation of hand and object poses,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2020, pp. 3193–3203.
21.
Y. Cai, L. Ge, J. Cai, N. Magnenat-Thalmann, and J. Yuan, “3D hand pose estimation using synthetic data and weakly labeled RGB images,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 43, no. 11, pp. 3739–3753, Nov. 2021.
22.
G. Park, A. Argyros, J. Lee, and W. Woo, “3D hand tracking in the presence of excessive motion blur,” IEEE Trans. Vis. Comput. Graphics, vol. 26, no. 5, pp. 1891–1901, May 2020.
23.
Y. Hasson, B. Tekin, F. Bogo, I. Laptev, M. Pollefeys, and C. Schmid, “Leveraging photometric consistency over time for sparsely supervised hand-object reconstruction,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2020, pp. 568–577.
24.
J. Yang, H. J. Chang, S. Lee, and N. Kwak, “SeqHAND: RGB-sequence-based 3D hand pose and shape estimation,” in Proc. Eur. Conf. Comput. Vis., 2020, pp. 122–139.
25.
S. Baek, K. I. Kim, and T.-K. Kim, “Weakly-supervised domain adaptation via GAN and mesh model for estimating 3D hand poses interacting objects,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2020, pp. 6120–6130.
26.
Y. Zhou, M. Habermann, W. Xu, I. Habibie, C. Theobalt, and F. Xu, “Monocular real-time hand shape and motion capture using multi-modal data,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2020, pp. 5345–5354.
27.
T. N. Kipf and M. Welling, “Semi-supervised classification with graph convolutional networks,” in Proc. Int. Conf. Learn. Representations, 2017.
28.
N. Kolotouros, G. Pavlakos, and K. Daniilidis, “Convolutional mesh regression for single-image human shape reconstruction,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2019, pp. 4496–4505.
29.
E. Tretschk, A. Tewari, M. Zollhöfer, V. Golyanik, and C. Theobalt, “DEMEA: Deep mesh autoencoders for non-rigidly deforming objects,” in Proc. Eur. Conf. Comput. Vis., 2020, pp. 601–617.
30.
B. Doosti, S. Naha, M. Mirbagheri, and D. Crandall, “HOPE-Net: A graph-based model for hand-object pose estimation,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2020, pp. 6607–6616.

Contact IEEE to Subscribe

References

References is not available for this document.