I. Introduction
The advent of machine learning and deep learning techniques has ushered in a new era of technological advancements in various domains, including computer vision, natural language processing, and human-computer interaction. Augmented reality (AR) technology, which amalgamates computer vision, sensor technology, and multi-modal human-computer interaction, has garnered widespread popularity, enabling users to interact with three-dimensional virtual worlds, delivering highly immersive experiences. Unlike conventional interaction devices, AR allows real-time interaction with virtual objects through bodily gestures, offering a more natural and human-centric experience.