I. Introduction
Variational autoencoders (VAEs) [24] have attracted significant interest for their capability of learning continuous and smooth distributions from observations by integrating probabilistic and deep neural learning principles. VAEs have demonstrated significant advantages of incorporating prior knowledge, mapping inputs to probabilistic representations, and approximating the likelihood of outputs. Their integration of a stochastic gradient variational Bayes (SGVB) estimator [24] with neural settings learns a narrow probabilistic latent space to infer more representative attributes in a hidden space. VAEs have been applied in various domains, including time series forecasting [13], out-of-domain detection in images [17], [30], [31], [39], generating images with spiking signals [21], and generating text by language modeling [42]. Beyond generative tasks, VAEs are widely used in representation learning tasks, particularly for disentanglement [33], [41], classification [20], [35], clustering [40], and manifold learning [1], [9]. However, VAEs still face significant issues in particular learning an appropriate tradeoff between representation compression and generation fitting.