Skip to Main Content
We propose an approximate Bayesian approach for unsupervised feature selection and density estimation, where the importance of the features for clustering is used as the measure for feature selection. Traditional maximum-likelihood (ML) model-parameter optimization schemes estimate the feature saliencies for a fixed model structure (i.e., a fixed number of clusters). In practice, the number of clusters present in the data for mixture-based modeling is unknown. In an ML framework, the number of clusters typically needs to be ascertained prior to estimating the feature saliencies. We propose a density estimation scheme that addresses model complexity (number of clusters present) and model-parameter estimation (feature saliencies) in a single optimization framework. The approximate Bayesian approach presented here, based on the expectation propagation method, obtains a full posterior distribution on the saliency of the features, along with full posterior distribution of other model parameters (including the number of clusters) that represent the underlying statistics of the data. The performance of the algorithm, is analyzed based on its ability to identify the features salient for clustering the multivariate data.