Skip to Main Content
Over the past decade, advances in computational and sensor technology have enabled us to dynamically collect vast amounts of data from observations, health screening tests, simulations, and experiments at an ever-increasing pace. Knowledge discovery and data mining is an iterative process concerned with deriving interesting, non-obvious, and useful patterns and models from such large volumes of data. Although inexpensive storage is conducive to maintaining said data, accessing and managing it for knowledge discovery and data mining becomes a performance issue when datasets are large, dynamic, and distributed. In this work, we present our vision of a software framework consisting of middleware services to support interactive data mining over dynamic data at data analysis centers built on top of heterogeneous clusters. The design of a sampling service for dynamic data, together with initial performance results, are also presented.