Skip to Main Content
Discovering a representation that reflects the structure of a dataset is a first step for many inference and learning methods. This paper aims at finding a hierarchy of localized speech features that can be interpreted as parts. Non-negative matrix factorization (NMF) has been proposed recently for the discovery of parts-based localized additive representations. The author proposes a variant of this method, convolutional NMF, that enforces a particular local connectivity with shared weights. Analysis starts from a spectrogram. The hidden representations produced by convolutional NMF are input to the same analysis method at the next higher level. Repeated application of convolutional NMF yields a sequence of increasingly abstract representations. These speech representations are parts-based, where complex higher-level parts are defined in terms of less complex lower-level ones.