1. Introduction
Human behaviour analysis through skeleton-based data has been widely investigated for decades. The advent of deep learning-based architectures increased its popularity even more, mainly due to the robustness of skeleton data in handling dynamic circumstances, appearance variations, and cluttered backgrounds. Over the last decade, the rise of data-driven approaches highly correlates performance with the scale of the learning set. Hence, generating high-quality synthetic human actions can address the problem of limited data. However, existing methods are still severely limited, particularly in conditioning desirable actions and considering the generation at the global movement level.