Skip to Main Content
Fuzzy clustering is a popular method for modeling web usage data, and a number of techniques have been proposed. Performance of such techniques has been demonstrated through experiments using datasets which are often limited in the size and/or variety. This is mainly due to the difficulty in acquiring large real data, and also to the huge amount of time and effort required in performing experiments. We investigate ways to ensure dependability of such results and their analyses. For this we consider three issues. First we need to ensure that the clustering quality indices used for comparing different techniques are not biased towards any parameter specific to any of them. Second, more ground truth is provided by measuring the quality through an application of the usage model than through the clustering quality index alone. Third, given the limited data sets and experiments, use of statistical significance testing can provide more confidence in that the results obtained are not by mere chance. We present our approach for dependable performance analysis using some well-known fuzzy clustering techniques along with prediction quality used as the application specific metric.
Date of Conference: March 30 2009-April 2 2009