Skip to Main Content
In many application domains such as information retrieval, computational biology, and image processing the data dimension is usually very high. Developing effective clustering methods for high dimensional dataset is a challenging problem due to the curse of dimensionality. The k-means clustering algorithm is used for many practical applications. But it is computationally expensive and the quality of the resulting clusters heavily depends on the selection of initial centroid and dimension of the data. The accuracy of the resultant value perhaps not up to the level of expectation when the dimensions of the dataset is high because we cannot say that the dataset chosen are free from noisy and flawless. So it is required to reduce the dimensionality of the given dataset in order to improve the efficiency and accuracy. This paper proposed a new approach to improve the accuracy of the cluster results by using PCA to determine the initial centroid and also to reduce the dimension of the data.
Date of Conference: 5-7 Aug. 2010