I. Introduction
Hand is one of the most crucial means that humans use to interact with the world. Hence, the task of estimating human hand pose as well as understanding hand action from images (or video) play an important role in the field of computer vision. There are many applications for these tasks ranging such as smart home devices controlling [1], rehabilitation assessment in medicine [2].