Skip to Main Content
Parzen Windows (PW) is a popular nonparametric density estimation technique. In general the smoothing kernel is placed on all available data points, which makes the algorithm computationally expensive when large datasets are considered. Several approaches have been proposed in the past to reduce the computational cost of PW either by subsampling the dataset, or by imposing a sparsity in the density model. Typically the latter requires a rather involved and complex learning process. In this paper, we propose a new simple and efficient kernel-based method for non-parametric probability density function (pdf) estimation on large datasets. We cover the entire data space by a set of fixed radii hyper-balls with densities represented by full covariance Gaussians. The accuracy and efficiency of the new estimator is verified on both synthetic dataset and large datasets of astronomical simulations of the galaxy disruption process. Experiments demonstrate that the estimation accuracy of the new estimator is comparable to that of the previous approaches but with a significant speed-up. We also show that the pdf learnt by the new estimator could used to automatically find the most matching set in large scale astronomical simulations.