Disentangled Speech Representation Learning Based on Factorized Hierarchical Variational Autoencoder with Self-Supervised Objective | IEEE Conference Publication | IEEE Xplore