Skip to Main Content
The ever increasing number of cores per chip will be accompanied by a pervasive data deluge whose size will probably increase even faster than CPU core count over the next few years. This suggests the importance of parallel data analysis and data mining applications with good multicore, cluster and grid performance. This paper considers data clustering, mixture models and dimensional reduction presenting a unified framework applicable to bioinformatics, cheminformatics and demographics. Deterministic annealing is used to lessen effect of local minima. We present performance results on clusters of 2-8 core systems identifying effects from cache, runtime fluctuations, synchronization and memory bandwidth. We discuss needed programming model and compare with MPI and other approaches.