Using fuzzy c-means clustering procedure to find a condensed set for Parzen windows estimation (ParzenFCMC) is proposed in this paper. The full Parzen windows estimator usually requires more computation and storage. However, the experimental simulations show that the significant increase of reference data may not improve the estimation performance of Parzen windows method obviously. In addition, the theoretical analysis validates the traditional Parzen windows estimator is sensitive to noise data. Thus, in order to improve the generalization capability (i.e., the adaptability to nosie data) of Parzen windows estimation, we try to find a condensed dataset to conduct the probability density estimation by adopting the following measures: 1) clustering the original dataset by using fuzzy c-means; 2) estimating the underlying density function based on the condensed reference set. Finally, the experimental results on the synthetic datasets obeying Uniform, Normal, Exponential, and Rayleigh distributions show the usefulness and effectiveness of proposed ParzenFCMC. The significant savings on computation and storage can be achieved with only minimal mean integrated squared error (MISE) degradation.
Published in:
Software Engineering and Service Science (ICSESS), 2012 IEEE 3rd International Conference on
Date of Conference: 22-24 June 2012