Abstract:
Disentangled representation learning seeks to align individual dimensions or separate groups of coordinates of latent factors with attributes of observed data such that p...Show MoreMetadata
Abstract:
Disentangled representation learning seeks to align individual dimensions or separate groups of coordinates of latent factors with attributes of observed data such that perturbing certain latent factors uniquely changes particular attributes. A main challenge in unsupervised disentanglement using autoencoders is that strong regularisation, while necessary for consistent disentanglement, comes at the expense of accurate data reconstruction. To address this, we introduce a teacher-student framework that incorporates a variational sequential autoencoder and a Jacobian constraint that regularises the variation of observations relative to latent factors. In real-world audio recordings of musical instruments, our approach outperforms a state-of-the-art method in both sampling quality and unsupervised pitch-timbre disentanglement.
Published in: ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Date of Conference: 14-19 April 2024
Date Added to IEEE Xplore: 18 March 2024
ISBN Information: